CN111145910A

CN111145910A - Abnormal case identification method and device based on artificial intelligence and computer equipment

Info

Publication number: CN111145910A
Application number: CN201911275089.1A
Authority: CN
Inventors: 李何言; 王玉婷
Original assignee: Ping An Medical and Healthcare Management Co Ltd
Current assignee: Ping An Medical and Healthcare Management Co Ltd
Priority date: 2019-12-12
Filing date: 2019-12-12
Publication date: 2020-05-12

Abstract

The application relates to big data, provides an abnormal case identification method based on artificial intelligence, is applied to a platform server, and comprises the following steps: acquiring diagnosis and treatment item data corresponding to the same disease type, taking diagnosis and treatment time as a unit, and performing text construction by taking the diagnosis and treatment item data as words in a text, wherein the constructed text comprises diagnosis and treatment item data of different users of the same disease type in a corresponding diagnosis and treatment time period; constructing a theme model for the text to obtain corresponding theme vector distribution, wherein each theme vector distribution corresponds to one theme and comprises a plurality of word identifications and corresponding word weights, and each theme has different user theme weights for different users; distributing and aligning the topic vectors corresponding to the texts and clustering to obtain diagnosis and treatment item categories corresponding to the users at different time to form diagnosis and treatment sequences corresponding to the users; and inputting diagnosis and treatment sequences corresponding to different diagnosis and treatment times of each user into the process mining model to obtain standard clinical paths corresponding to the disease types so as to identify abnormal cases.

Description

Abnormal case identification method and device based on artificial intelligence and computer equipment

Technical Field

The present application relates to the field of computer technologies, and in particular, to an abnormal case identification method and apparatus based on artificial intelligence, a computer device, and a storage medium.

Background

With the further advance of the reform and development of the medical insurance system, the fraudulent conduct of the medical insurance is more and more intense. The medical insurance fraud measures are various and comprise excessive medical treatment, illegal drug opening, hospitalization decomposition, false set name, medical insurance card swiping, illegal reimbursement and the like. In the face of the huge threat of cheating insurance behaviors, the current medical insurance system lacks an effective countermeasures.

The traditional abnormal case identification is usually carried out by each medical institution through a local server by adopting a self-defined algorithm, for example, the first medical institution identifies the abnormal case through a first hospital server by adopting a first identification algorithm, the identification range of the identification carried out by each self-defined algorithm is low, the identification accuracy is poor, cases with suspicious points are difficult to be covered efficiently, the identification algorithm of each medical institution is required to be arranged on the local server, and a large amount of computer resources are wasted.

Disclosure of Invention

Therefore, it is necessary to provide an abnormal case identification method, an abnormal case identification device, a computer device and a storage medium based on artificial intelligence, so that the abnormal case is identified through a unified platform server, the identification accuracy and the identification range of the abnormal case are improved, and computer resources are saved.

An abnormal case identification method based on artificial intelligence is applied to a platform server, and the method comprises the following steps:

acquiring diagnosis and treatment item data corresponding to the same disease type, taking diagnosis and treatment time as a unit, and performing text construction by taking the diagnosis and treatment item data as words in texts, wherein each constructed text comprises diagnosis and treatment item data of different users of the same disease type in a corresponding diagnosis and treatment time period;

constructing a theme model for each text to obtain theme vector distribution corresponding to each text, wherein each theme vector distribution comprises a plurality of word identifications and corresponding word weights, each theme vector distribution corresponds to one theme, and each theme has different user theme weights relative to different users;

the topic vector distributions corresponding to the texts are aligned, the aligned topic vector distributions are clustered to obtain diagnosis and treatment item categories corresponding to the users at different time, and the diagnosis and treatment item categories corresponding to the users form diagnosis and treatment sequences corresponding to the users;

and inputting diagnosis and treatment sequences corresponding to different diagnosis and treatment times of each user into a process mining model to obtain a standard clinical path corresponding to the disease species, and identifying abnormal cases according to the standard clinical path.

In one embodiment, the method further comprises:

acquiring diagnosis and treatment item data and case description information corresponding to a plurality of different users of the same disease, and constructing the diagnosis and treatment item data and the case description information corresponding to each user to form diagnosis and treatment samples corresponding to each user;

reducing the dimension of the diagnosis and treatment sample corresponding to each user based on the characterization learning of the deep neural network, and extracting to obtain a context attribute vector;

and performing outlier detection on the context attribute vector corresponding to each user, calculating an abnormal score corresponding to each diagnosis and treatment sample, and identifying the diagnosis and treatment sample with the abnormal score higher than a threshold value as an abnormal diagnosis and treatment sample.

In one embodiment, the performing outlier detection on the context attribute vector corresponding to each user, calculating an abnormal score corresponding to each diagnosis and treatment sample, and identifying the diagnosis and treatment sample with the abnormal score higher than a threshold as an abnormal diagnosis and treatment sample includes:

calculating the distance between the context attribute vectors of any two users, and acquiring a user with the distance from the first user being smaller than a preset threshold value to obtain a related user set;

determining a predicted diagnosis and treatment cost grade corresponding to the first user according to the actual diagnosis and treatment cost grade corresponding to each user in the associated user set;

calculating an abnormal score corresponding to the diagnosis and treatment sample of the first user according to the difference between the actual diagnosis and treatment cost grade and the predicted diagnosis and treatment cost grade of the first user, wherein the abnormal score is in direct proportion to the difference.

clustering the context attribute vectors corresponding to the users to obtain a plurality of different cluster clusters;

acquiring the capacity of each cluster, sequencing according to the capacity, and identifying cluster clusters with a preset proportion as abnormal cluster clusters according to the sequence of the capacity from low to high;

determining the abnormal scores of the diagnosis and treatment samples corresponding to the abnormal clustering clusters according to the capacity, and identifying the diagnosis and treatment samples with the abnormal scores higher than the threshold value as abnormal diagnosis and treatment samples.

In one embodiment, the method further comprises:

cleaning the diagnosis and treatment item data, and processing missing values in the diagnosis and treatment item data;

discretizing a type field and a continuous value field in the diagnosis and treatment item data, and standardizing a continuous variable in the diagnosis and treatment item data;

and merging the diagnosis and treatment items in the diagnosis and treatment item data according to the similarity of the diagnosis and treatment items.

An abnormal case recognition device based on artificial intelligence is applied to a platform server, and the device comprises:

the text construction module is used for acquiring diagnosis and treatment item data corresponding to the same disease type, taking diagnosis and treatment time as a unit, and performing text construction by taking the diagnosis and treatment item data as words in texts, wherein each constructed text comprises diagnosis and treatment item data of different users of the same disease type in a corresponding diagnosis and treatment time period;

the theme vector distribution module is used for constructing a theme model for each text to obtain theme vector distribution corresponding to each text, each theme vector distribution comprises a plurality of word identifications and corresponding word weights, each theme vector distribution corresponds to one theme, and each theme has different user theme weights relative to different users;

the diagnosis and treatment sequence determining module is used for aligning the topic vector distribution corresponding to each text, clustering the aligned topic vector distribution to obtain diagnosis and treatment item categories corresponding to each user at different time, and forming a diagnosis and treatment sequence corresponding to each user according to each diagnosis and treatment item category corresponding to each user;

and the abnormal case identification module is used for inputting diagnosis and treatment sequences corresponding to different diagnosis and treatment times of each user into the process mining model to obtain a standard clinical path corresponding to the disease type, and identifying abnormal cases according to the standard clinical path.

In one embodiment, the apparatus further comprises:

the abnormal diagnosis and treatment sample identification module is used for acquiring diagnosis and treatment item data and case description information corresponding to a plurality of different users of the same disease type, and constructing the diagnosis and treatment item data and the case description information corresponding to each user to form diagnosis and treatment samples corresponding to each user; reducing the dimension of the diagnosis and treatment sample corresponding to each user based on the characterization learning of the deep neural network, and extracting to obtain a context attribute vector; and performing outlier detection on the context attribute vector corresponding to each user, calculating an abnormal score corresponding to each diagnosis and treatment sample, and identifying the diagnosis and treatment sample with the abnormal score higher than a threshold value as an abnormal diagnosis and treatment sample.

In one embodiment, the abnormal diagnosis and treatment sample identification module is further configured to calculate a distance between context attribute vectors of any two users, and obtain a user whose distance from the first user is smaller than a preset threshold to obtain an associated user set; determining a predicted diagnosis and treatment cost grade corresponding to the first user according to the actual diagnosis and treatment cost grade corresponding to each user in the associated user set; calculating an abnormal score corresponding to the diagnosis and treatment sample of the first user according to the difference between the actual diagnosis and treatment cost grade and the predicted diagnosis and treatment cost grade of the first user, wherein the abnormal score is in direct proportion to the difference.

A computer device comprising a memory and a processor, the memory storing a computer program, the processor implementing the following steps when executing the computer program:

A computer-readable storage medium, on which a computer program is stored which, when executed by a processor, carries out the steps of:

The abnormal case identification method, the abnormal case identification device, the computer equipment and the storage medium based on the artificial intelligence are applied to the platform server, different texts are respectively constructed on diagnosis and treatment item data of different users of the same disease type and different diagnosis and treatment dates, subject extraction and clustering are carried out to obtain diagnosis and treatment item categories corresponding to the users at different times, diagnosis and treatment sequence corresponding to the users is formed by each diagnosis and treatment item category corresponding to each user, a standard clinical path corresponding to the disease type is obtained by a process mining model, the abnormal case is identified according to the standard clinical path, the standard treatment path corresponding to each disease type is standardized, the generation accuracy and standardization of the standard clinical path are improved, the abnormal cases in various regions are uniformly identified through the platform server, the identification accuracy and the identification range of the abnormal cases are improved, and the identification automation of the abnormal cases is improved, the recognition efficiency is improved, and computer resources are saved.

Drawings

FIG. 1 is a diagram of an exemplary application environment for an artificial intelligence based abnormal case identification method;

FIG. 2 is a schematic flow chart illustrating an artificial intelligence based abnormal case identification method according to an embodiment;

FIG. 3 is a diagram illustrating distribution of topic vectors in one embodiment;

FIG. 4 is a block diagram of an apparatus for identifying abnormal cases based on artificial intelligence in one embodiment;

FIG. 5 is a diagram illustrating an internal structure of a computer device according to an embodiment.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.

The abnormal case identification method based on artificial intelligence can be applied to the application environment shown in fig. 1. Fig. 1 is a diagram of an application environment in which an artificial intelligence-based abnormal case identification method operates in one embodiment. As shown in fig. 1, the application environment includes a terminal 110, a terminal 120, a platform server 130, a first hospital server 140, and a second hospital server 150. The terminals and the servers communicate with each other through a network, which may be a wireless or wired communication network, such as an IP network, a cellular mobile communication network, etc., wherein the number of the terminals and the servers is not limited.

The

terminals

110 and 120 may be, but are not limited to, various personal computers, notebook computers, smart phones, tablet computers, and portable wearable devices. The server may be implemented as a stand-alone server or as a server cluster consisting of a plurality of servers. The platform server 130 acquires diagnosis and treatment item data corresponding to the same disease type from the first hospital server 140 and/or the second hospital server 150, text construction is performed on the diagnosis and treatment item data serving as words in texts by taking diagnosis and treatment time as a unit, each constructed text comprises diagnosis and treatment item data of different users of the same disease type in a corresponding diagnosis and treatment time period, a topic model construction is performed on each text, topic vector distribution corresponding to each text is acquired, each topic vector distribution comprises a plurality of word identifications and corresponding word weights, each topic vector distribution corresponds to one topic, and each topic has different user topic weights relative to different users; and aligning the topic vector distribution corresponding to each text, clustering the aligned topic vector distribution to obtain diagnosis and treatment item categories corresponding to each user at different time, forming diagnosis and treatment sequences corresponding to the users by the diagnosis and treatment item categories corresponding to each user, inputting the diagnosis and treatment sequences corresponding to each user at different diagnosis and treatment time into a process mining model to obtain a standard clinical path corresponding to the disease type, and identifying abnormal cases according to the standard clinical path.

The platform server 130 may receive an electronic ticket reimbursement request sent by the terminal 110 or the terminal 120, identify an abnormal case according to the current disease information in the electronic ticket reimbursement request, and determine a reimbursement result according to the identification result.

In one embodiment, as shown in fig. 2, an abnormal case identification method based on artificial intelligence is provided, which is described by taking the method as an example applied to the first server 130 in fig. 1, and includes the following steps:

step 210, acquiring diagnosis and treatment item data corresponding to the same disease type, taking diagnosis and treatment time as a unit, and performing text construction by using the diagnosis and treatment item data as words in texts, wherein each constructed text comprises diagnosis and treatment item data of different users of the same disease type in a corresponding diagnosis and treatment time period.

The diagnosis and treatment item data is actual settlement data corresponding to each diagnosis and treatment item, and comprises user names, charging item detail data, detail data such as diagnosis and treatment time, diagnosis and treatment item names, diagnosis and treatment cost, inspection and examination, medicines, consumables and the like.

Specifically, diagnosis and treatment item data corresponding to each user of the same disease type are obtained, for example, diagnosis and treatment item data of a first day, a second day and an nth day of diagnosis and treatment are obtained by taking diagnosis and treatment time as a unit and taking a preset time interval as a unit, for example, taking a day as a unit, and all diagnosis and treatment item data of each user of the same disease type on the nth day of hospital are formed into texts corresponding to the days. In one embodiment, when the text is constructed, the diagnosis and treatment item data corresponding to the same area identifier is constructed, so that different texts can be constructed for the same disease type corresponding to different area identifiers, the text is more consistent with the actual conditions of the local area or the hospital, and the matching of the standard clinical path with the region and the hospital is improved.

Step 220, constructing a topic model for each text to obtain topic vector distribution corresponding to each text, wherein each topic vector distribution comprises a plurality of word identifications and corresponding word weights, each topic vector distribution corresponds to a topic, and each topic has different user topic weights relative to different users.

Specifically, LDA (Latent Dirichlet Allocation) analysis is performed on the item name corresponding to each text, so as to obtain the topic vector distribution corresponding to each text. The LDA model is an unsupervised machine learning technique that can be used to identify underlying subject information in a collection of documents. The theme is a set formed by aggregating a plurality of words obtained after clustering, and one document can correspond to a plurality of themes, namely belongs to a plurality of types; a topic may contain multiple words, each with a corresponding probability. And inputting the words in each text into an LDA model, obtaining a plurality of subjects through unsupervised learning of the model, and forming a subject set by the plurality of subjects. Each topic corresponds to one topic vector distribution, one topic vector distribution corresponds to a plurality of words, and each word corresponds to a corresponding probability which is calculated by an LDA model.

As shown in fig. 3, topic vector distribution corresponding to one text includes 3 topic vector distributions, each topic vector distribution includes a plurality of word identifiers and corresponding word weights, and a topic corresponding to each topic vector distribution can be obtained by performing weighted summation on the word identifiers and the corresponding word weights to form a topic set corresponding to each text, where as shown in fig. 3, topic vector distributions respectively correspond to a topic a, a topic B, and a topic C, and if there are 2 users, a user 1, and a user 2, a first weight exists for the topic a corresponding to the user 1, and a second weight exists for the user 2, where a user topic weight of a user corresponding to a topic can be determined by the topic vector distribution and the diagnosis and treatment item data corresponding to the user.

And step 230, aligning the topic vector distributions corresponding to the texts, clustering the aligned topic vector distributions to obtain diagnosis and treatment item categories corresponding to the users at different times, wherein the diagnosis and treatment item categories corresponding to the users form diagnosis and treatment sequences corresponding to the users.

The alignment refers to collecting words in which the topic vectors are distributed to obtain an aligned word set, wherein the aligned topic vectors comprise all words in the aligned word set.

Specifically, if the first word in the aligned word set does not exist in the topic vector distribution before alignment, the word weight corresponding to the first word in each topic vector distribution after alignment is 0. And (3) clustering by using a kmeans (k-means clustering algorithm) to obtain the diagnosis and treatment item category corresponding to each user. Although the topic vector distributions of the users on the same day are the same, the topic vector distributions have different user topic weights relative to different users, so that the diagnosis and treatment item categories of the users on the same day obtained by clustering the aligned topic vector distributions are different. If the user 1 corresponds to the category A and the category B on the first diagnosis day, the user 2 corresponds to the category B and the category C on the first diagnosis day. And forming diagnosis and treatment sequences corresponding to the users according to the diagnosis and treatment item categories corresponding to each user.

And 240, inputting diagnosis and treatment sequences corresponding to different diagnosis and treatment times of each user into a process mining model to obtain a standard clinical path corresponding to the disease type, and identifying abnormal cases according to the standard clinical path.

Specifically, the standard clinical path includes target diagnosis and treatment item categories corresponding to each diagnosis and treatment time determined by the process mining model, the target diagnosis and treatment items corresponding to each diagnosis and treatment time are determined according to the target diagnosis and treatment item categories, and the standard clinical path corresponding to the disease category is obtained according to the diagnosis and treatment time sequence. If 90% of patients have performed the diagnosis and treatment items of the diagnosis and treatment item category a on the first day of diagnosis and treatment, the diagnosis and treatment items of the diagnosis and treatment item category a are target diagnosis and treatment items of the standard clinical path on the first day of diagnosis and treatment.

The method comprises the steps of counting item amounts corresponding to all target diagnosis and treatment items on a standard clinical path according to a path flow chart corresponding to the standard clinical path, calculating cost, and obtaining standard diagnosis and treatment expenses corresponding to the disease, wherein if the difference between the diagnosis and treatment expenses corresponding to a first case and the standard diagnosis and treatment expenses exceeds a threshold value, the first case is an abnormal case, and abnormal diagnosis and treatment behaviors exist. And calculating the similarity between the first case of the same disease and the standard clinical path, and if the similarity is smaller than a preset threshold, indicating that the first case is an abnormal case.

The abnormal case identification method based on artificial intelligence is applied to a platform server, different texts are respectively constructed on diagnosis and treatment item data of different users of the same disease type and different diagnosis and treatment dates, topic extraction and clustering are carried out to obtain diagnosis and treatment item categories corresponding to the users at different times, each diagnosis and treatment item category corresponding to each user forms a diagnosis and treatment sequence corresponding to the user, a process mining model is carried out to obtain a standard clinical path corresponding to the disease type, the abnormal cases are identified according to the standard clinical path, the standard treatment paths corresponding to various disease categories are standardized, the generation accuracy and the standardization of the standard clinical path are improved, the platform server uniformly identifies the abnormal cases in each area, so that the identification accuracy and the identification range of the abnormal cases are improved, the identification automation of the abnormal cases is improved, the identification efficiency is improved, and the computer resources are saved.

In one embodiment, the method further comprises: acquiring diagnosis and treatment item data and case description information corresponding to a plurality of different users of the same disease, and constructing the diagnosis and treatment item data and the case description information corresponding to each user to form diagnosis and treatment samples corresponding to each user; reducing the dimension of the diagnosis and treatment sample corresponding to each user based on the characterization learning of the deep neural network, and extracting to obtain a context attribute vector; and performing outlier detection on the context attribute vector corresponding to each user, calculating an abnormal score corresponding to each diagnosis and treatment sample, and identifying the diagnosis and treatment sample with the abnormal score higher than a threshold value as an abnormal diagnosis and treatment sample.

The case description information includes patient information, diagnosis information, information of attending physicians, and the like, such as the age, sex, main diagnosis record, name of attending physician, and the like of the user. The case description information can be extracted from a part of the fields in the case home page. The diagnosis and treatment project data are used as main characteristics, and case description information is used as other characteristics to construct diagnosis and treatment samples corresponding to all users. Characterization learning is a form of converting raw data into data that can be efficiently developed by machine learning, and is classified into supervised and unsupervised learning.

Specifically, dimension reduction is performed on diagnosis and treatment samples corresponding to each user based on the characterization learning of the deep neural network, and original user information, hospital information and hospitalization details are reduced by using the deep neural network, so that context attribute vectors corresponding to each user are obtained. Outlier detection refers to detecting abnormal values in a data set, namely detecting behaviors inconsistent with most data points, performing outlier detection by calculating abnormal scores corresponding to all diagnosis and treatment samples, and identifying the diagnosis and treatment samples as abnormal diagnosis and treatment samples if the abnormal scores are higher than a threshold value. The method for calculating the abnormal score can be self-defined, the distance between 2 samples can be calculated by adopting a metric learning method, the predicted diagnosis and treatment cost of each sample is determined according to the distance, and the abnormal score of each sample is determined according to the difference between the predicted diagnosis and treatment cost and the actual diagnosis and treatment cost. And clustering each diagnosis and treatment sample by adopting a clustering method, and determining the abnormal score of each diagnosis and treatment sample according to a clustering result.

In one embodiment, performing outlier detection on the context attribute vector corresponding to each user, calculating an abnormal score corresponding to each diagnosis and treatment sample, and identifying the diagnosis and treatment sample with the abnormal score higher than a threshold as an abnormal diagnosis and treatment sample includes: calculating the distance between the context attribute vectors of any two users, and acquiring a user with the distance from the first user being smaller than a preset threshold value to obtain a related user set; determining a predicted diagnosis and treatment cost grade corresponding to a first user according to an actual diagnosis and treatment cost grade corresponding to each user in the associated user set; and calculating an abnormal score corresponding to the diagnosis and treatment sample of the first user according to the difference between the actual diagnosis and treatment cost grade and the predicted diagnosis and treatment cost grade of the first user, wherein the abnormal score is in direct proportion to the difference.

In particular, a mahalanobis distance-based metric learning method may be employed to calculate the distance between the context attribute vectors of any two users. And obtaining a user with a distance from the first user being smaller than a preset threshold value to obtain a related user set, and calculating the probability of the actual diagnosis and treatment cost grade corresponding to each user in the related user set, wherein if the first grade is 80%, the second grade is 15%, and the third grade is 5%, the grade with the maximum proportion ratio, namely the first grade, is used as the predicted diagnosis and treatment cost grade of the first user. The difference between the predicted diagnosis and treatment cost grade and the actual diagnosis and treatment cost grade of the first user can be used as the abnormal score of the diagnosis and treatment sample of the first user, and the larger the difference is, the higher the abnormal score is.

In one embodiment, performing outlier detection on the context attribute vector corresponding to each user, calculating an abnormal score corresponding to each diagnosis and treatment sample, and identifying the diagnosis and treatment sample with the abnormal score higher than a threshold as an abnormal diagnosis and treatment sample includes: clustering the context attribute vectors corresponding to the users to obtain a plurality of different cluster clusters; acquiring the capacity of each cluster, sequencing according to the capacity, and identifying cluster clusters in a preset proportion as abnormal cluster clusters according to the sequence of the capacity from low to high; and determining the abnormal scores of the corresponding diagnosis and treatment samples in each abnormal clustering cluster according to the capacity, and identifying the diagnosis and treatment samples with the abnormal scores higher than the threshold value as abnormal diagnosis and treatment samples.

The method comprises the steps of obtaining the capacity of each cluster, determining abnormal scores corresponding to diagnosis and treatment samples in the cluster according to the capacity of the cluster, wherein the abnormal scores are in inverse proportion to the capacity of the cluster.

In one embodiment, the method further comprises: cleaning diagnosis and treatment item data, and processing missing values in the diagnosis and treatment item data; discretizing a type field and a continuous value field in the diagnosis and treatment item data, and standardizing a continuous variable in the diagnosis and treatment item data; and merging the diagnosis and treatment items in the diagnosis and treatment item data according to the similarity of the diagnosis and treatment items.

Specifically, the steps in the present embodiment may be implemented before the steps in the above embodiments. The data cleaning is the first stage of data analysis, and is mainly used for cleaning and filtering obvious unreasonable variables in diagnosis and treatment project data, such as inconsistent variables and variable values which do not accord with medical common knowledge. If the data is null or NA, the data is a missing value, a mean value and mode filling method can be adopted for the data missing from a few fields, and the data missing from a large number of fields can be directly deleted. The field values are classified, such as tuberculosis, pneumonia, bronchitis and other main diagnoses, discretization is carried out, such as preset character strings are used as mapping values corresponding to classified fields, such as 0001 for tuberculosis, 0002 for pneumonia and the like. The continuous value field is discretized, e.g., by rounding to 25.33. Continuous categorical variables can be normalized using the z-score standard score method, such as 0.9% sodium chloride injection (bagged) (base) combined with 0.9% sodium chloride injection, and names included are combined. The diagnosis and treatment items with the unified diagnosis and treatment purpose can be combined according to clinical knowledge, for example, blood cell five classification, white blood cell count, red blood cell count and the like are combined into a blood routine, and the combined diagnosis and treatment item is used as one diagnosis and treatment item in the diagnosis and treatment item data.

In one embodiment, the association between hospital identification or regional identification, first disease category, standard clinical pathway, standard cost is recorded. The method comprises the steps of receiving an electronic reimbursement bill uploaded by a first terminal, extracting a current hospital identifier or a current area identifier and current disease information in the electronic reimbursement bill, determining a standard clinical path corresponding to the electronic reimbursement bill according to the current hospital identifier or the current area identifier and the current disease information, comparing a current diagnosis and treatment item to be reimbursed in the electronic reimbursement bill with a standard diagnosis and treatment item corresponding to the standard clinical path, reimbursing only the diagnosis and treatment item to be reimbursed which meets a target standard diagnosis and treatment item, identifying and automatically filtering abnormal cases, and guaranteeing standardization and unification of reimbursement.

It should be understood that, although the steps in the flowchart of fig. 2 are shown in order as indicated by the arrows, the steps are not necessarily performed in order as indicated by the arrows. The steps are not performed in the exact order shown and described, and may be performed in other orders, unless explicitly stated otherwise. Moreover, at least a portion of the steps in fig. 2 may include multiple sub-steps or multiple stages that are not necessarily performed at the same time, but may be performed at different times, and the order of performance of the sub-steps or stages is not necessarily sequential, but may be performed in turn or alternately with other steps or at least a portion of the sub-steps or stages of other steps.

In one embodiment, as shown in fig. 4, there is provided an artificial intelligence based abnormal case identification apparatus, including: a text construction module 310, a topic vector distribution module 320, a diagnosis and treatment sequence determination module 330, and an abnormal case identification module 340, wherein:

the text construction module 310 is configured to obtain diagnosis and treatment item data corresponding to the same disease type, and perform text construction using the diagnosis and treatment item data as words in a text with diagnosis and treatment time as a unit, where each constructed text includes diagnosis and treatment item data of different users of the same disease type in a corresponding diagnosis and treatment time period.

The topic vector distribution module 320 is configured to perform topic model construction on each text to obtain topic vector distribution corresponding to each text, where each topic vector distribution includes a plurality of word identifiers and corresponding word weights, and each topic vector distribution corresponds to a topic, where each topic has a different user topic weight with respect to different users.

The diagnosis and treatment sequence determining module 330 is configured to align topic vector distributions corresponding to the texts, cluster the aligned topic vector distributions to obtain diagnosis and treatment item categories corresponding to the users at different times, and form a diagnosis and treatment sequence corresponding to the user according to the diagnosis and treatment item categories corresponding to the users.

The abnormal case identification module 340 is configured to input the diagnosis and treatment sequences corresponding to different diagnosis and treatment times of each user into the process mining model to obtain a standard clinical path corresponding to the disease type, and identify an abnormal case according to the standard clinical path.

In one embodiment, the apparatus further comprises:

the abnormal diagnosis and treatment sample identification module 350 is configured to obtain diagnosis and treatment item data and case description information corresponding to a plurality of different users of the same disease type, and construct the diagnosis and treatment item data and the case description information corresponding to each user to form a diagnosis and treatment sample corresponding to each user; reducing the dimension of the diagnosis and treatment sample corresponding to each user based on the characterization learning of the deep neural network, and extracting to obtain a context attribute vector; and performing outlier detection on the context attribute vector corresponding to each user, calculating an abnormal score corresponding to each diagnosis and treatment sample, and identifying the diagnosis and treatment sample with the abnormal score higher than a threshold value as an abnormal diagnosis and treatment sample.

In one embodiment, the abnormal diagnosis and treatment sample identification module 350 is further configured to calculate a distance between context attribute vectors of any two users, and obtain a user whose distance from the first user is smaller than a preset threshold to obtain an associated user set; determining a predicted diagnosis and treatment cost grade corresponding to the first user according to the actual diagnosis and treatment cost grade corresponding to each user in the associated user set; calculating an abnormal score corresponding to the diagnosis and treatment sample of the first user according to the difference between the actual diagnosis and treatment cost grade and the predicted diagnosis and treatment cost grade of the first user, wherein the abnormal score is in direct proportion to the difference.

In one embodiment, the abnormal diagnosis and treatment sample identification module 350 is further configured to cluster context attribute vectors corresponding to the users to obtain a plurality of different cluster clusters; acquiring the capacity of each cluster, sequencing according to the capacity, and identifying cluster clusters with a preset proportion as abnormal cluster clusters according to the sequence of the capacity from low to high; determining the abnormal scores of the diagnosis and treatment samples corresponding to the abnormal clustering clusters according to the capacity, and identifying the diagnosis and treatment samples with the abnormal scores higher than the threshold value as abnormal diagnosis and treatment samples.

In one embodiment, the apparatus further comprises:

the preprocessing module 360 is used for cleaning the diagnosis and treatment item data and processing missing values in the diagnosis and treatment item data; discretizing a type field and a continuous value field in the diagnosis and treatment item data, and standardizing a continuous variable in the diagnosis and treatment item data; and merging the diagnosis and treatment items in the diagnosis and treatment item data according to the similarity of the diagnosis and treatment items.

For specific limitations of the abnormal case identification device based on artificial intelligence, reference may be made to the above limitations of the abnormal case identification method based on artificial intelligence, and details are not described here. The modules in the above-mentioned abnormal case recognition device based on artificial intelligence can be wholly or partially realized by software, hardware and their combination. The modules can be embedded in a hardware form or independent from a processor in the computer device, and can also be stored in a memory in the computer device in a software form, so that the processor can call and execute operations corresponding to the modules.

In one embodiment, a computer device is provided, which may be a server, the internal structure of which may be as shown in fig. 5. The computer device includes a processor, a memory, a network interface, and a database connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device comprises a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, a computer program, and a database. The internal memory provides an environment for the operation of an operating system and computer programs in the non-volatile storage medium. The database of the computer device is for storing clinical pathway related data. The network interface of the computer device is used for communicating with an external terminal through a network connection. The computer program is executed by a processor to implement an artificial intelligence based abnormal case identification method.

Those skilled in the art will appreciate that the architecture shown in fig. 5 is merely a block diagram of some of the structures associated with the disclosed aspects and is not intended to limit the computing devices to which the disclosed aspects apply, as particular computing devices may include more or less components than those shown, or may combine certain components, or have a different arrangement of components.

In one embodiment, there is provided a computer device comprising a memory storing a computer program and a processor implementing the following steps when the processor executes the computer program: acquiring diagnosis and treatment item data corresponding to the same disease type, taking diagnosis and treatment time as a unit, and performing text construction by taking the diagnosis and treatment item data as words in texts, wherein each constructed text comprises diagnosis and treatment item data of different users of the same disease type in a corresponding diagnosis and treatment time period; constructing a theme model for each text to obtain theme vector distribution corresponding to each text, wherein each theme vector distribution comprises a plurality of word identifications and corresponding word weights, each theme vector distribution corresponds to one theme, and each theme has different user theme weights relative to different users; the topic vector distributions corresponding to the texts are aligned, the aligned topic vector distributions are clustered to obtain diagnosis and treatment item categories corresponding to the users at different time, and the diagnosis and treatment item categories corresponding to the users form diagnosis and treatment sequences corresponding to the users; and inputting diagnosis and treatment sequences corresponding to different diagnosis and treatment times of each user into the process mining model to obtain standard clinical paths corresponding to the disease types, and identifying abnormal cases according to the standard clinical paths.

In one embodiment, the processor, when executing the computer program, further performs the steps of: acquiring diagnosis and treatment item data and case description information corresponding to a plurality of different users of the same disease, and constructing the diagnosis and treatment item data and the case description information corresponding to each user to form diagnosis and treatment samples corresponding to each user; reducing the dimension of the diagnosis and treatment sample corresponding to each user based on the characterization learning of the deep neural network, and extracting to obtain a context attribute vector; and performing outlier detection on the context attribute vector corresponding to each user, calculating an abnormal score corresponding to each diagnosis and treatment sample, and identifying the diagnosis and treatment sample with the abnormal score higher than a threshold value as an abnormal diagnosis and treatment sample.

In one embodiment, the performing outlier detection on the context attribute vector corresponding to each user, calculating an abnormal score corresponding to each clinical sample, and identifying the clinical sample with the abnormal score higher than a threshold as an abnormal clinical sample includes: calculating the distance between the context attribute vectors of any two users, and acquiring a user with the distance from the first user being smaller than a preset threshold value to obtain a related user set; determining a predicted diagnosis and treatment cost grade corresponding to the first user according to an actual diagnosis and treatment cost grade corresponding to each user in the associated user set; and calculating an abnormal score corresponding to the diagnosis and treatment sample of the first user according to the difference between the actual diagnosis and treatment cost grade and the predicted diagnosis and treatment cost grade of the first user, wherein the abnormal score is in direct proportion to the difference.

In one embodiment, the performing outlier detection on the context attribute vector corresponding to each user, calculating an abnormal score corresponding to each clinical sample, and identifying the clinical sample with the abnormal score higher than a threshold as an abnormal clinical sample includes: clustering the context attribute vectors corresponding to the users to obtain a plurality of different cluster clusters; acquiring the capacity of each cluster, sequencing according to the capacity, and identifying cluster clusters with a preset proportion as abnormal cluster clusters according to the sequence of the capacity from low to high; determining the abnormal scores of the diagnosis and treatment samples corresponding to the abnormal clustering clusters according to the capacity, and identifying the diagnosis and treatment samples with the abnormal scores higher than the threshold value as abnormal diagnosis and treatment samples.

In one embodiment, the processor, when executing the computer program, further performs the steps of: cleaning the diagnosis and treatment item data, and processing missing values in the diagnosis and treatment item data; discretizing a type field and a continuous value field in the diagnosis and treatment item data, and standardizing a continuous variable in the diagnosis and treatment item data; and merging the diagnosis and treatment items in the diagnosis and treatment item data according to the similarity of the diagnosis and treatment items.

In one embodiment, a computer-readable storage medium is provided, having a computer program stored thereon, which when executed by a processor, performs the steps of: acquiring diagnosis and treatment item data corresponding to the same disease type, taking diagnosis and treatment time as a unit, and performing text construction by taking the diagnosis and treatment item data as words in texts, wherein each constructed text comprises diagnosis and treatment item data of different users of the same disease type in a corresponding diagnosis and treatment time period; constructing a theme model for each text to obtain theme vector distribution corresponding to each text, wherein each theme vector distribution comprises a plurality of word identifications and corresponding word weights, each theme vector distribution corresponds to one theme, and each theme has different user theme weights relative to different users; the topic vector distributions corresponding to the texts are aligned, the aligned topic vector distributions are clustered to obtain diagnosis and treatment item categories corresponding to the users at different time, and the diagnosis and treatment item categories corresponding to the users form diagnosis and treatment sequences corresponding to the users; and inputting diagnosis and treatment sequences corresponding to different diagnosis and treatment times of each user into the process mining model to obtain standard clinical paths corresponding to the disease types, and identifying abnormal cases according to the standard clinical paths.

In one embodiment, the computer program when executed by the processor further performs the steps of: acquiring diagnosis and treatment item data and case description information corresponding to a plurality of different users of the same disease, and constructing the diagnosis and treatment item data and the case description information corresponding to each user to form diagnosis and treatment samples corresponding to each user; reducing the dimension of the diagnosis and treatment sample corresponding to each user based on the characterization learning of the deep neural network, and extracting to obtain a context attribute vector; and performing outlier detection on the context attribute vector corresponding to each user, calculating an abnormal score corresponding to each diagnosis and treatment sample, and identifying the diagnosis and treatment sample with the abnormal score higher than a threshold value as an abnormal diagnosis and treatment sample.

It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware instructions of a computer program, which can be stored in a non-volatile computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. Any reference to memory, storage, database, or other medium used in the embodiments provided herein may include non-volatile and/or volatile memory, among others. Non-volatile memory can include read-only memory (ROM), Programmable ROM (PROM), Electrically Programmable ROM (EPROM), Electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), Dynamic RAM (DRAM), Synchronous DRAM (SDRAM), Double Data Rate SDRAM (DDRSDRAM), Enhanced SDRAM (ESDRAM), Synchronous Link DRAM (SLDRAM), Rambus Direct RAM (RDRAM), direct bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM).

The technical features of the above embodiments can be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the above embodiments are not described, but should be considered as the scope of the present specification as long as there is no contradiction between the combinations of the technical features.

The above-mentioned embodiments only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims

1. An abnormal case identification method based on artificial intelligence is applied to a platform server, and the method comprises the following steps:

2. The method of claim 1, further comprising:

3. The method according to claim 2, wherein the performing outlier detection on the context attribute vector corresponding to each user, calculating an abnormal score corresponding to each diagnosis and treatment sample, and identifying the diagnosis and treatment sample with the abnormal score higher than a threshold as an abnormal diagnosis and treatment sample comprises:

4. The method according to claim 2, wherein the performing outlier detection on the context attribute vector corresponding to each user, calculating an abnormal score corresponding to each diagnosis and treatment sample, and identifying the diagnosis and treatment sample with the abnormal score higher than a threshold as an abnormal diagnosis and treatment sample comprises:

5. The method according to any one of claims 1 to 4, further comprising:

6. An abnormal case recognition device based on artificial intelligence is applied to a platform server, and is characterized in that the device comprises:

7. The apparatus of claim 6, further comprising:

8. The apparatus according to claim 7, wherein the abnormal diagnosis and treatment sample identification module is further configured to calculate a distance between context attribute vectors of any two users, and obtain a set of associated users from users whose distance from the first user is smaller than a preset threshold; determining a predicted diagnosis and treatment cost grade corresponding to the first user according to the actual diagnosis and treatment cost grade corresponding to each user in the associated user set; calculating an abnormal score corresponding to the diagnosis and treatment sample of the first user according to the difference between the actual diagnosis and treatment cost grade and the predicted diagnosis and treatment cost grade of the first user, wherein the abnormal score is in direct proportion to the difference.

9. A computer device comprising a memory and a processor, the memory storing a computer program, wherein the processor implements the steps of the method of any one of claims 1 to 5 when executing the computer program.

10. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method according to any one of claims 1 to 5.