WO2020119386A1 - 基于大数据的异常数据识别方法、设备、存储介质及装置 - Google Patents

基于大数据的异常数据识别方法、设备、存储介质及装置 Download PDF

Info

Publication number
WO2020119386A1
WO2020119386A1 PCT/CN2019/118839 CN2019118839W WO2020119386A1 WO 2020119386 A1 WO2020119386 A1 WO 2020119386A1 CN 2019118839 W CN2019118839 W CN 2019118839W WO 2020119386 A1 WO2020119386 A1 WO 2020119386A1
Authority
WO
WIPO (PCT)
Prior art keywords
charging
data
item
fee table
diagnosis
Prior art date
Application number
PCT/CN2019/118839
Other languages
English (en)
French (fr)
Inventor
陈明东
黄越
胥畅
Original Assignee
平安医疗健康管理股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安医疗健康管理股份有限公司 filed Critical 平安医疗健康管理股份有限公司
Publication of WO2020119386A1 publication Critical patent/WO2020119386A1/zh

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q20/00Payment architectures, schemes or protocols
    • G06Q20/38Payment protocols; Details thereof
    • G06Q20/389Keeping log of transactions for guaranteeing non-repudiation of a transaction
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/70ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for mining of medical data, e.g. analysing previous cases of other patients

Definitions

  • the present application relates to the technical field of abnormal data identification, and in particular to a method, equipment, storage medium, and device for identifying abnormal data based on big data.
  • the package includes the item to be charged again, or the number of charges exceeds the actual number of projects.
  • the treatment package for lumbar discectomy already includes the cost of medicines.
  • the hospital will repeatedly charge patients for medicines.
  • the main means of investigating the above repeated charging situation is: the staff of the Human Resources and Social Security Bureau finds and checks whether the charge is abnormal in the huge detailed data of the diagnosis and treatment.
  • this method is prone to two types of problems. First, manual inspection inevitably has omissions. The second is lower efficiency, longer time consumption and higher cost.
  • the main purpose of the present application is to provide an abnormal data identification method, device, storage medium and device based on big data, aiming to solve the technical problem of how to identify duplicate charging data more effectively in the prior art.
  • the present application provides a method for identifying abnormal data based on big data.
  • the method for identifying abnormal data based on big data includes the following steps:
  • Abnormal mining is performed on the standardized diagnosis and treatment data through a preset nested loop algorithm to identify repeated charging data in the standardized diagnosis and treatment data.
  • the anomaly mining of the standardized diagnosis and treatment data through a preset nested loop algorithm to identify repeated charging data in the standardized diagnosis and treatment data includes:
  • the fee summary table includes several charging records, and each charging record includes at least a charging item;
  • the package fee table and the single-item fee table contain charging records of the same charging item, the charging records containing the same charging item are used as repeated charging data.
  • the determining whether the package fee table and the single fee table contain charging records of the same charging items through a preset nested loop algorithm includes:
  • the charging records containing the same charging item are used as repeated charging data, including:
  • the first charging record and the second charging record contain the same charging item, the first charging record and the second charging record containing the same charging item are used as the repeated charging data.
  • the method before traversing the package fee table by a preset nested loop algorithm and selecting the first charging record, the method further includes:
  • the step of traversing the package fee table through a preset nested loop algorithm to select the first charge record is performed.
  • the standardizing the diagnosis and treatment data to obtain standardized diagnosis and treatment data includes:
  • the sentence matrix is compressed into a sentence vector through a preset attention model, and the sentence vector is used as standardized diagnosis and treatment data.
  • the encoding the word vector sequence into a sentence matrix according to a preset bidirectional recurrent neural network model includes:
  • the word vector sequence is sequentially input into a preset bidirectional recursive neural network model in a forward direction and then a reverse direction, so that the preset bidirectional recursive neural network model encodes the word vector sequence and outputs a sentence matrix.
  • the compressing the sentence matrix into a sentence vector through a preset attention model and using the sentence vector as standardized diagnosis and treatment data includes:
  • an abnormal data identification device based on big data which includes a memory, a processor, and is stored on the memory and can be processed in the
  • the abnormal data recognition computer-readable instructions based on big data running on the device are configured to implement the steps of the abnormal data recognition method based on big data as described above.
  • the present application also proposes a storage medium that stores computer-readable instructions for identifying abnormal data based on big data, and the computer-readable instructions for identifying abnormal data based on big data are processed When the device is executed, it implements the steps of the abnormal data identification method based on big data as described above.
  • the present application also provides an apparatus for identifying abnormal data based on big data.
  • the apparatus for identifying abnormal data based on big data includes:
  • the acquisition module is used to obtain patient's diagnosis and treatment data
  • a processing module configured to perform standardized processing on the diagnosis and treatment data to obtain standardized diagnosis and treatment data
  • the mining module is configured to perform abnormal mining on the standardized diagnosis and treatment data through a preset nested loop algorithm to identify repeated charging data in the standardized diagnosis and treatment data.
  • the patient's diagnosis and treatment data is obtained; the diagnosis and treatment data is standardized to obtain standardized diagnosis and treatment data; and the standardized diagnosis and treatment data is abnormally mined through a preset nested loop algorithm to identify the standardized diagnosis and treatment data Data for repeated charges.
  • the standardized diagnosis and treatment data is obtained, and the standardized nested loop algorithm is used to perform abnormal mining on the standardized diagnosis and treatment data, so that the repeated charging data can be accurately identified.
  • the identification method has high efficiency and low cost And can urge the hospital to charge reasonably to protect the interests of patients.
  • FIG. 1 is a schematic structural diagram of an abnormal data identification device based on big data in a hardware operating environment involved in an embodiment of the present application
  • FIG. 2 is a schematic flowchart of a first embodiment of a method for identifying abnormal data based on big data according to the present application
  • FIG. 3 is a schematic flowchart of a second embodiment of a method for identifying abnormal data based on big data according to the present application
  • FIG. 4 is a schematic flowchart of a third embodiment of a method for identifying abnormal data based on big data according to the present application.
  • FIG. 5 is a structural block diagram of a first embodiment of an apparatus for identifying abnormal data based on big data of the present application.
  • FIG. 1 is a schematic structural diagram of an abnormal data recognition device based on big data in a hardware operating environment according to an embodiment of the present application.
  • the abnormal data identification device based on big data may include: a processor 1001, such as a central processor (Central Processing Unit, CPU), communication bus 1002, user interface 1003, network interface 1004, memory 1005.
  • the communication bus 1002 is used to implement connection communication between these components.
  • the user interface 1003 may include a display (Display), and the optional user interface 1003 may further include a standard wired interface and a wireless interface.
  • the wired interface of the user interface 1003 may be a USB interface in this application.
  • the network interface 1004 may optionally include a standard wired interface and a wireless interface (such as a wireless fidelity (WIreless-FIdelity, WI-FI) interface).
  • the memory 1005 may be a high-speed random access memory (Random Access Memory (RAM) memory can also be a stable memory (Non-volatile Memory, NVM), such as disk storage.
  • RAM Random Access Memory
  • NVM Non-volatile Memory
  • the memory 1005 may optionally be a storage device independent of the foregoing processor 1001.
  • FIG. 1 does not constitute a limitation on the abnormal data recognition device based on big data, and may include more or fewer components than those illustrated, or combine certain components, or different Parts arrangement.
  • the memory 1005 as a computer storage medium may include an operating system, a network communication module, a user interface module, and computer-readable instructions for identifying abnormal data based on big data.
  • the network interface 1004 is mainly used to connect to a background server to perform data communication with the background server;
  • the user interface 1003 is mainly used to connect to peripheral devices and the peripheral device Perform data communication;
  • the big data-based abnormal data identification device calls the computer-readable instructions for identifying abnormal data based on big data stored in the memory 1005 through the processor 1001, and executes the abnormal data based on big data provided by the embodiments of the present application recognition methods.
  • FIG. 2 is a schematic flowchart of a first embodiment of a method for identifying abnormal data based on big data of the present application, and proposes a first embodiment of a method for identifying abnormal data based on big data of the present application.
  • the method for identifying abnormal data based on big data includes the following steps:
  • Step S10 Obtain the diagnosis and treatment data of the patient.
  • the execution subject of this embodiment is an abnormal data recognition device based on big data.
  • the abnormal data recognition device based on big data may be an electronic device such as a personal computer or a server.
  • the application scenario of this embodiment is that a patient Swipe the medical insurance card to settle the medical expenses in real time during hospital treatment, and the medical insurance card will record the patient's medical treatment data, which includes the time of treatment, charging items and the amount of charges, etc., and upload the medical treatment data to the core system of the human society.
  • users will use the abnormal data recognition equipment based on big data to perform abnormal mining on the patient's diagnosis and treatment data to identify the repetitive charging data in the diagnosis and treatment data to avoid unreasonable repetitive charging in outpatient clinics Circumstances to protect the interests of patients.
  • the abnormal data recognition device based on big data obtains patient's diagnosis and treatment data from the core system of human society based on the identity information of the patient, so as to subsequently identify repeated charging data from the diagnosis and treatment data.
  • the patient identity information includes information such as the patient's name and ID number, and the ID number is used to confirm the patient's identity and manage the patient list.
  • Step S20 Perform standardized processing on the diagnosis and treatment data to obtain standardized diagnosis and treatment data.
  • diagnosis and treatment data recorded in the core system of the human society is generally non-standard text information.
  • diagnosis and treatment data will be standardized in advance to The diagnosis and treatment data is converted into standardized diagnosis and treatment data that can be recognized by the computer.
  • neuro-Linguistic Neuro-Linguistic Programming (NLP) technology converts the diagnosis and treatment data into standardized diagnosis and treatment data, and expresses the words in the diagnosis and treatment data through vectors.
  • a preset bidirectional recursive neural network model is used to encode the vector into a A sentence matrix, and the sentence matrix is compressed into a sentence vector through an attention model, and the sentence vector is the standardized diagnosis and treatment data.
  • Step S30 Abnormal mining is performed on the standardized diagnosis and treatment data through a preset nested loop algorithm to identify repeated charging data in the standardized diagnosis and treatment data.
  • the charging items include package charging items and single charging items.
  • the package charging items refer to a package containing multiple charging items, for example, a lumbar discectomy treatment package includes bed fees and nursing fees, inspection and inspection Fees, medicines and consumables costs, doctors' medical treatment fees and surgery fees, etc., and individual fee items refer to an independent fee item, such as surgery fees. Due to the unreasonable charge management in some hospitals, the charge items in the package will also be charged. For example, the treatment package for lumbar discectomy already includes the drug fee, but the hospital will repeatedly charge the patient's drug fee, so it is urgent to identify the place The repetitive charging data in the standardized diagnosis and treatment data is described to reduce the phenomenon of arbitrary charging in hospitals.
  • the nested loop algorithm is used to check whether the same data exists in the two data tables one by one. Therefore, according to this feature, The preset nested loop algorithm is constructed to search for repeated charging data in the standardized diagnosis and treatment data. Dividing the standardized diagnosis and treatment data into package diagnosis and treatment data and individual diagnosis and treatment data, and using the package diagnosis and treatment data as a basis to determine whether the charging items of the individual diagnosis and treatment data are the same as any of the charging items in the package diagnosis and treatment data, Thus, repeated charging data is obtained according to the judgment result.
  • the patient's diagnosis and treatment data is obtained; the diagnosis and treatment data is standardized to obtain standardized diagnosis and treatment data; and the standardized diagnosis and treatment data is abnormally mined through a preset nested loop algorithm to identify the standardized Repeat charge data in the diagnosis and treatment data.
  • the diagnosis and treatment data is standardized, the standardized diagnosis and treatment data is obtained, and the standardized diagnosis and treatment data is abnormally mined using a preset nested loop algorithm, so that the repeated charging data can be accurately identified, and the identification efficiency is high, the cost is low, and Can urge the hospital to charge reasonably and protect the interests of patients.
  • FIG. 3 is a schematic flowchart of a second embodiment of the method for identifying abnormal data based on big data of the present application. Based on the first embodiment shown in FIG. ⁇ Two embodiments.
  • step S30 includes:
  • Step S301 Generate a fee summary table based on the standardized diagnosis and treatment data.
  • the fee summary table includes several charging records, and each charging record includes at least a charging item.
  • the preset nested loop algorithm will be used to check whether the charging items in the package charging items are charged again, and first generate a total fee table based on the standardized diagnosis and treatment data.
  • a preset fee summary table is established, and the charging time, charging items, and charging amounts are set in the preset fee summary table, and the charging time, charging items, and charging amounts are extracted from each piece of standardized medical data And other information, the extracted information is filled into the preset fee summary table as a charging record, thereby generating a fee summary table containing several charging records.
  • Step S302 Split the total fee table into a package fee table and a single fee table according to the type of the charging item.
  • the total fee table will be split into a package fee table and a single fee according to the type of the charging item Table, for each record in the fee summary table, determine the type of charge item in the record, when the type of charge item in the record is a package charge item, save the record to the package charge table In the case, when the type of charge item in the record is a single charge item, the record is saved in the single charge table.
  • Step S303 Determine whether the package fee table and the single fee table contain charging records of the same charging item through a preset nested loop algorithm.
  • Step S304 If the package fee table and the single item fee table contain charging records of the same charging item, then the charging records containing the same charging item are used as the repeated charging data.
  • the package fee table and the single item fee table After acquiring the package fee table and the single item fee table, it will be determined whether the package fee table and the single item fee table contain charging records of the same charged items by a preset nested loop algorithm, if If the package fee table and the single-item fee table contain charging records of the same charging item, the charging records containing the same charging item are used as repeated charging data.
  • step S303 includes:
  • the step S304 includes:
  • the first charging record and the second charging record contain the same charging item, the first charging record and the second charging record containing the same charging item are used as the repeated charging data.
  • the nested loop algorithm includes a driving table and a driven table.
  • the package fee table is used as the driving table
  • the single item fee table is used as the driven table to traverse the package fee.
  • Table select the first charging record from the package fee table, and match the first charging record with multiple second charging records in the single item fee table in succession. A successful match indicates that the first charging record is The second charging record contains the same charging item. If the match fails, it means that the first charging record and the second charging record do not contain the same charging item, so that the first charging record and the second charging record containing the same charging item will be included.
  • the charging record is used as repeated charging data.
  • the first charge record is a charge record for lumbar discectomy
  • the charge record includes the charge time, the charge item, and the charge amount, where the charge item includes the bed fee and nursing fee, inspection and inspection fee, Charges for medicines and consumables, doctors’ medical fees, and surgical charges
  • the second charge record includes multiple charge records, and the charge items in each second charge record are bed charges, ECG monitoring, and oxygen therapy in turn ,
  • the first charge record is matched with multiple second charge records in sequence, first match the lumbar discectomy treatment charge record with the bed fee, the match is successful, and then the lumbar disc removal treatment charge record is matched with The ECG monitoring is matched, the match fails, and finally the lumbar discectomy treatment charge record is matched with the oxygen inhalation treatment, and the match fails. Therefore, the first charge record including the bed fee and the second charge record including the bed fee As repeated charging data.
  • the method Before traversing the package fee table through a preset nested loop algorithm and selecting the first charging record, the method further includes:
  • the step of traversing the package fee table through a preset nested loop algorithm to select the first charge record is performed.
  • the number of record selections in the drive table is the same as the number of records in the drive table. Therefore, in order to improve the implementation efficiency of the preset nested loop algorithm, a table with a smaller number of records in the table is used as the drive table, Therefore, the number of records in the package fee table and the single item fee table are obtained separately, and the number of records in the package fee table and the single item fee table are compared. If the number of records in the package fee table is low For the number of records in the single item fee table, the package fee table is used as the driving table, and the single item fee table is used as the driven table, so that the traversing the package fee table through the preset nested loop algorithm is performed , Select the first charge record step.
  • the single fee table is used as the driving table, and the package fee table is used as the driven table, Traverse the single item fee table, select charging records, and match the selected charging records with the charging records in the package fee table, so as to identify repeated charging data, which is convenient and efficient.
  • a fee summary table is generated based on the standardized diagnosis and treatment data, and the fee summary table includes several charging records, and each charging record includes at least a charging item; the total cost is based on the type of the charging item
  • the table is split into a package fee table and a single item fee table; a preset nested loop algorithm is used to determine whether the package fee table and the single item fee table contain charging records of the same charging item. If the package fee table and the single item The fee table contains charging records of the same charging item, and the charging record containing the same charging item is used as repeated charging data. Since the package fee schedule and the single expense schedule are matched one by one, all repeated charging data can be accurately obtained, the hospital can be urged to charge reasonably, and the interests of patients can be protected.
  • FIG. 4 is a schematic flowchart of a third embodiment of a method for identifying abnormal data based on big data of the present application. Based on the second embodiment shown in FIG. ⁇ Three embodiments.
  • step S20 includes:
  • Step S201 Perform word segmentation processing on the diagnosis and treatment data to generate a word sequence.
  • Step S202 Convert the words in the word sequence into word vectors, and generate corresponding word vector sequences.
  • diagnosis and treatment data needs to be converted into standardized diagnosis and treatment data that can be recognized by a computer, such as a vector.
  • diagnosis and treatment data is subjected to word segmentation processing to generate A word sequence, which includes each word and a sequence of words in the medical data.
  • the words in the word sequence are converted into word vectors, and a sequence of word vectors is obtained by combining the sequence of words, and the word vector includes a sequence of word vectors and word vectors of the medical data.
  • Step S203 encode the word vector sequence into a sentence matrix according to a preset bidirectional recurrent neural network model.
  • the preset bidirectional recurrent neural network (Bidirectional recurrent neural network network (BRNN) model is a neural network model with a feedback structure
  • the word vector is input into the preset bidirectional recursive neural network model, so that the preset bidirectional recursive neural network model treats the word vector
  • the sequence encodes and outputs a sentence matrix, where each row of the sentence matrix represents the meaning of each word in the context.
  • Step S204 Compress the sentence matrix into a sentence vector through a preset attention model, and use the sentence vector as standardized diagnosis and treatment data.
  • the attention model (Attention Model) is used to select information that is more critical to the current task goal from a large number of information, and the preset attention model is used to extract valid data from the sentence matrix and convert the valid data into sentence vectors.
  • step S203 includes:
  • the word vector sequence is sequentially input into a preset bidirectional recursive neural network model in a forward direction and then a reverse direction, so that the preset bidirectional recursive neural network model encodes the word vector sequence and outputs a sentence matrix.
  • the word vector sequence is sequentially input forward and backward into the preset bidirectional recurrent neural network model, where forward input refers to inputting the word vectors in the word vector sequence according to position
  • the reverse input refers to sequentially inputting the word vectors in the word vector sequence in reverse order to the preset bidirectional recursive neural network model at the corresponding time
  • the input signal of the preset bidirectional recursive neural network model at each current time also includes the output signal of the preset bidirectional recursive neural network model at the previous time.
  • step S204 includes:
  • the context vector expresses the context relationship between word vectors
  • the context vector is extracted from the sentence matrix through the preset attention model
  • the sentence matrix is compressed into sentences according to the context vector Vectors can improve the accuracy and comprehensiveness of sentence vectors, thereby obtaining accurate standardized diagnosis and treatment data.
  • word segmentation processing is performed on the diagnosis and treatment data to generate a word sequence, the words in the word sequence are converted into word vectors, and a corresponding word vector sequence is generated.
  • a preset bidirectional recurrent neural network model all The predicate vector sequence is encoded as a sentence matrix, and the sentence matrix is compressed into a sentence vector through a preset attention model, and the sentence vector is used as standardized diagnosis and treatment data. Due to the dependence on the context vector, the efficiency and accuracy of generating standardized diagnosis and treatment data are improved.
  • an embodiment of the present application further provides a storage medium, and the storage medium may be a non-volatile readable storage medium.
  • the storage medium stores computer-readable instructions for identifying abnormal data based on big data.
  • the computer-readable instructions for identifying abnormal data based on big data are executed by a processor, the computer-readable instructions for identifying abnormal data based on big data are implemented as described above. Method steps. The method implemented when the computer-readable instruction is executed can refer to various embodiments of the method for identifying abnormal data based on big data in the present application, and details are not described herein again.
  • an embodiment of the present application further provides an apparatus for identifying abnormal data based on big data.
  • the apparatus for identifying abnormal data based on big data includes:
  • the obtaining module 10 is used to obtain patient diagnosis and treatment data
  • the processing module 20 is configured to perform standardized processing on the diagnosis and treatment data to obtain standardized diagnosis and treatment data;
  • the mining module 30 is configured to perform abnormal mining on the standardized diagnosis and treatment data through a preset nested loop algorithm to identify repeated charging data in the standardized diagnosis and treatment data.
  • the mining module 30 is further configured to generate a fee summary table based on the standardized diagnosis and treatment data, the fee summary table includes several charging records, and each charging record includes at least a charging item; according to the charging The type of project splits the total cost table into a package cost table and a single item cost table; it is determined by the preset nested loop algorithm whether the package cost table and the single item cost table contain charging records of the same charging item, which will include The charging records of the same charging items are regarded as repeated charging data.
  • the mining module 30 is further configured to traverse the package fee table through a preset nested loop algorithm to select a first charging record; the first charging record and the single item fee table Multiple second charging records are matched in sequence, and it is determined whether the first charging record and the second charging record contain the same charging item according to the matching result; the first charging record and the second charging record containing the same charging item are repeated Charge data.
  • the mining module 30 is further used to compare the number of records in the package fee table with the single item fee table; when the number of records in the package fee table is lower than the single item fee When the number of records in the table, the step of traversing the package fee table through a preset nested loop algorithm to select the first charge record is performed.
  • the processing module 20 is further configured to perform word segmentation processing on the diagnosis and treatment data to generate a word sequence; convert words in the word sequence into word vectors to generate corresponding word vector sequences;
  • a bidirectional recurrent neural network model is used to encode the word vector sequence into a sentence matrix; the sentence matrix is compressed into a sentence vector through a preset attention model, and the sentence vector is used as standardized diagnosis and treatment data.
  • the processing module 20 is further configured to sequentially input the word vector sequence into a preset bidirectional recursive neural network model in a forward direction and then a reverse direction, so that the preset bidirectional recursive neural network model Encode the word vector sequence and output a sentence matrix.
  • the processing module 20 is further configured to extract a context vector from the sentence matrix through a preset attention model; compress the sentence matrix into a sentence vector according to the context vector, and Sentence vectors are used as standardized diagnosis and treatment data.

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Public Health (AREA)
  • Data Mining & Analysis (AREA)
  • Accounting & Taxation (AREA)
  • Medical Informatics (AREA)
  • Health & Medical Sciences (AREA)
  • Finance (AREA)
  • Theoretical Computer Science (AREA)
  • Strategic Management (AREA)
  • General Physics & Mathematics (AREA)
  • Biomedical Technology (AREA)
  • Databases & Information Systems (AREA)
  • General Business, Economics & Management (AREA)
  • Pathology (AREA)
  • Physics & Mathematics (AREA)
  • Epidemiology (AREA)
  • General Health & Medical Sciences (AREA)
  • Primary Health Care (AREA)
  • Medical Treatment And Welfare Office Work (AREA)

Abstract

一种基于大数据的异常数据识别方法、设备、存储介质及装置,该方法包括:获取患者的诊疗数据(S10);对所述诊疗数据进行标准化处理,获得标准化诊疗数据(S20);通过预设嵌套循环算法对所述标准化诊疗数据进行异常挖掘,以识别所述标准化诊疗数据中的重复收费数据(S30)。由于对患者的诊疗数据进行标准化,获得了标准化诊疗数据,并利用预设嵌套循环算法对该标准化诊疗数据进行异常挖掘,从而能够准确地识别出重复收费数据,识别效率高、成本低,而且能够督促医院合理收费,保障患者的利益。

Description

基于大数据的异常数据识别方法、设备、存储介质及装置
本申请要求于2018年12月13日提交中国专利局、申请号为201811530843.7、发明名称为“基于大数据的重复收费识别方法、设备、存储介质及装置”的中国专利申请的优先权,其全部内容通过引用结合在申请中。
技术领域
本申请涉及异常数据识别技术领域,尤其涉及一种基于大数据的异常数据识别方法、设备、存储介质及装置。
背景技术
由于医保体系的不完善,参保人在医院治疗期间,可能会出现套餐内的包含项目再次收费,或收费次数超过项目实际开展次数的情形,例如腰椎间盘摘除术治疗套餐已经包含了药品费用,但医院还会重复收取患者的药品费用。目前对上述重复收费情形进行排查的主要手段是:人社局工作人员在庞大的诊疗明细数据中查找并核对收费是否异常,然而,该手段易出现两类问题,一是人工排查难免存在疏漏,二是效率较低、耗时较长及成本较高。
发明内容
本申请的主要目的在于提供一种基于大数据的异常数据识别方法、设备、存储介质及装置,旨在解决现有技术中如何更有效地识别重复收费数据的技术问题。
为实现上述目的,本申请提供一种基于大数据的异常数据识别方法,所述基于大数据的异常数据识别方法包括以下步骤:
获取患者的诊疗数据;
对所述诊疗数据进行标准化处理,获得标准化诊疗数据;
通过预设嵌套循环算法对所述标准化诊疗数据进行异常挖掘,以识别所述标准化诊疗数据中的重复收费数据。
优选地,所述通过预设嵌套循环算法对所述标准化诊疗数据进行异常挖掘,以识别所述标准化诊疗数据中的重复收费数据,包括:
根据所述标准化诊疗数据生成费用总表,所述费用总表包括若干条收费记录,每条收费记录至少包括收费项目;
根据所述收费项目的类型将所述费用总表拆分为套餐费用表和单项费用表;
通过预设嵌套循环算法判断所述套餐费用表与所述单项费用表是否包含相同收费项目的收费记录;
若所述套餐费用表与所述单项费用表包含相同收费项目的收费记录,则将所述包含相同收费项目的收费记录作为重复收费数据。
优选地,所述通过预设嵌套循环算法判断所述套餐费用表与所述单项费用表是否包含相同收费项目的收费记录,包括:
通过预设嵌套循环算法遍历所述套餐费用表,选取第一收费记录;
将所述第一收费记录与所述单项费用表中的多条第二收费记录依次进行匹配,根据匹配结果判断所述第一收费记录与所述第二收费记录是否包含相同收费项目;
所述若所述套餐费用表与所述单项费用表包含相同收费项目的收费记录,则将所述包含相同收费项目的收费记录作为重复收费数据,包括:
若所述第一收费记录与所述第二收费记录包含相同收费项目,则将包含相同收费项目的第一收费记录和第二收费记录作为重复收费数据。
优选地,所述通过预设嵌套循环算法遍历所述套餐费用表,选取第一收费记录之前,所述方法还包括:
将所述套餐费用表与所述单项费用表中的记录数目进行比较;
若所述套餐费用表中的记录数目低于所述单项费用表中的记录数目,则执行所述通过预设嵌套循环算法遍历所述套餐费用表,选取第一收费记录的步骤。
优选地,所述对所述诊疗数据进行标准化处理,获得标准化诊疗数据,包括:
对所述诊疗数据进行分词处理,生成词语序列;
将所述词语序列中的词语转化为词向量,生成对应的词向量序列;
根据预设双向递归神经网络模型将所述词向量序列编码为句子矩阵;
通过预设注意力模型将所述句子矩阵压缩为句向量,并将所述句向量作为标准化诊疗数据。
优选地,所述根据预设双向递归神经网络模型将所述词向量序列编码为句子矩阵,包括:
将所述词向量序列依次先正向后反向输入到预设双向递归神经网络模型中,以使所述预设双向递归神经网络模型对所述词向量序列进行编码,并输出句子矩阵。
优选地,所述通过预设注意力模型将所述句子矩阵压缩为句向量,并将所述句向量作为标准化诊疗数据,包括:
通过预设注意力模型从所述句子矩阵中提取上下文向量;
根据所述上下文向量将所述句子矩阵压缩为句向量,并将所述句向量作为标准化诊疗数据。
此外,为实现上述目的,本申请还提出一种基于大数据的异常数据识别设备,所述基于大数据的异常数据识别设备包括存储器、处理器及存储在所述存储器上并可在所述处理器上运行的基于大数据的异常数据识别计算机可读指令,所述基于大数据的异常数据识别计算机可读指令配置为实现如上文所述的基于大数据的异常数据识别方法的步骤。
此外,为实现上述目的,本申请还提出一种存储介质,所述存储介质上存储有基于大数据的异常数据识别计算机可读指令,所述基于大数据的异常数据识别计算机可读指令被处理器执行时实现如上文所述的基于大数据的异常数据识别方法的步骤。
此外,为实现上述目的,本申请还提出一种基于大数据的异常数据识别装置,所述基于大数据的异常数据识别装置包括:
获取模块,用于获取患者的诊疗数据;
处理模块,用于对所述诊疗数据进行标准化处理,获得标准化诊疗数据;
挖掘模块,用于通过预设嵌套循环算法对所述标准化诊疗数据进行异常挖掘,以识别所述标准化诊疗数据中的重复收费数据。
本申请中,通过获取患者的诊疗数据;对所述诊疗数据进行标准化处理,获得标准化诊疗数据;通过预设嵌套循环算法对所述标准化诊疗数据进行异常挖掘,以识别所述标准化诊疗数据中的重复收费数据。由于对患者的诊疗数据进行标准化,获得了标准化诊疗数据,并利用预设嵌套循环算法对该标准化诊疗数据进行异常挖掘,从而能够准确地识别出重复收费数据,该识别方法效率高、成本低,而且能够督促医院合理收费,保障患者的利益。
附图说明
图1是本申请实施例方案涉及的硬件运行环境的基于大数据的异常数据识别设备结构示意图;
图2为本申请基于大数据的异常数据识别方法第一实施例的流程示意图;
图3为本申请基于大数据的异常数据识别方法第二实施例的流程示意图;
图4为本申请基于大数据的异常数据识别方法第三实施例的流程示意图;
图5为本申请基于大数据的异常数据识别装置第一实施例的结构框图。
本申请目的的实现、功能特点及优点将结合实施例,参照附图做进一步说明。
具体实施方式
应当理解,此处所描述的具体实施例仅仅用以解释本申请,并不用于限定本申请。
参照图1,图1为本申请实施例方案涉及的硬件运行环境的基于大数据的异常数据识别设备结构示意图。
如图1所示,该基于大数据的异常数据识别设备可以包括:处理器1001,例如中央处理器(Central Processing Unit,CPU),通信总线1002、用户接口1003,网络接口1004,存储器1005。其中,通信总线1002用于实现这些组件之间的连接通信。用户接口1003可以包括显示屏(Display),可选用户接口1003还可以包括标准的有线接口、无线接口,对于用户接口1003的有线接口在本申请中可为USB接口。网络接口1004可选的可以包括标准的有线接口、无线接口(如无线保真(WIreless-FIdelity,WI-FI)接口)。存储器1005可以是高速的随机存取存储器(Random Access Memory,RAM)存储器,也可以是稳定的存储器(Non-volatile Memory,NVM),例如磁盘存储器。存储器1005可选的还可以是独立于前述处理器1001的存储装置。
本领域技术人员可以理解,图1中示出的结构并不构成对基于大数据的异常数据识别设备的限定,可以包括比图示更多或更少的部件,或者组合某些部件,或者不同的部件布置。
如图1所示,作为一种计算机存储介质的存储器1005中可以包括操作系统、网络通信模块、用户接口模块以及基于大数据的异常数据识别计算机可读指令。
在图1所示的基于大数据的异常数据识别设备中,网络接口1004主要用于连接后台服务器,与所述后台服务器进行数据通信;用户接口1003主要用于连接外设,与所述外设进行数据通信;所述基于大数据的异常数据识别设备通过处理器1001调用存储器1005中存储的基于大数据的异常数据识别计算机可读指令,并执行本申请实施例提供的基于大数据的异常数据识别方法。
基于上述硬件结构,提出本申请基于大数据的异常数据识别方法的实施例。
参照图2,图2为本申请基于大数据的异常数据识别方法第一实施例的流程示意图,提出本申请基于大数据的异常数据识别方法第一实施例。
在第一实施例中,所述基于大数据的异常数据识别方法包括以下步骤:
步骤S10:获取患者的诊疗数据。
需要说明的是,本实施例的执行主体是基于大数据的异常数据识别设备,所述基于大数据的异常数据识别设备可为个人电脑或服务器等电子设备,本实施例的应用场景是,患者在医院治疗时刷医保卡即时结算诊疗费用,医保卡就会记录患者的诊疗数据,所述诊疗数据包括就诊时间、收费项目及收费金额等,并将该诊疗数据上传至人社核心系统,每隔固定时间,例如,一季度,用户将使用所述基于大数据的异常数据识别设备对患者的诊疗数据进行异常挖掘,识别出所述诊疗数据中的重复收费数据,避免门诊不合理的重复收费情形,保障患者的利益。
在具体实现中,所述基于大数据的异常数据识别设备根据患者的身份信息从人社核心系统中获取患者的诊疗数据,以供后续从所述诊疗数据中识别出重复收费数据。所述患者身份信息包含患者姓名和身份证号等信息,所述身份证号用于确认患者身份和管理患者名单。
步骤S20:对所述诊疗数据进行标准化处理,获得标准化诊疗数据。
可以理解的是,所述人社核心系统记载的诊疗数据一般为不规范的文本信息,为了方便地识别出重复收费数据,在进行异常挖掘之前,将预先对所述诊疗数据进行标准化处理,将所述诊疗数据转化为计算机能够识别的标准化诊疗数据。
在具体实现中,利用神经语言计算机可读指令学(Neuro-Linguistic Programming,NLP)技术将所述诊疗数据转化为标准化诊疗数据,通过向量表示所述诊疗数据中的词语,为了表示每个词语之间的联系,使用预设双向递归神经网络模型将向量编码为一个句子矩阵,并通过注意力模型将所述句子矩阵压缩为句向量,该句向量即为所述标准化诊疗数据。
步骤S30:通过预设嵌套循环算法对所述标准化诊疗数据进行异常挖掘,以识别所述标准化诊疗数据中的重复收费数据。
需要说明的是,收费项目包含套餐收费项目以及单项收费项目,所述套餐收费项目指的是一个套餐中包含多项收费项目,例如腰椎间盘摘除术治疗套餐包含床位费和护理费、检查和检验费用、药品及耗材费用、医生的诊疗费和手术费等收费项目,而单项收费项目指的是一项独立的收费项目,比如手术费用。由于部分医院存在收费管理不合理,套餐内的收费项目另外还会再收费,例如腰椎间盘摘除术治疗套餐已经包含了药品费用,但医院还会重复收取患者的药品费用,因此,急需识别出所述标准化诊疗数据中的重复收费数据,减少医院乱收费现象。
在具体实现中,由于本实施例是为了识别出套餐收费项目中的收费项目是否再次收费,而嵌套循环算法用于逐条检查两个数据表中是否存在相同数据,因此,根据这一特性,构造所述预设嵌套循环算法,以查找所述标准化诊疗数据中的重复收费数据。将所述标准化诊疗数据拆分为套餐诊疗数据和单项诊疗数据,以所述套餐诊疗数据以基准,判断所述单项诊疗数据的收费项目是否与所述套餐诊疗数据中的任一收费项目相同,从而根据判断结果获得重复收费数据。
在第一实施例中,通过获取患者的诊疗数据;对所述诊疗数据进行标准化处理,获得标准化诊疗数据;通过预设嵌套循环算法对所述标准化诊疗数据进行异常挖掘,以识别所述标准化诊疗数据中的重复收费数据。由于对患者的诊疗数据进行标准化,获得了标准化诊疗数据,并利用预设嵌套循环算法对该标准化诊疗数据进行异常挖掘,从而能够准确地识别出重复收费数据,识别效率高、成本低,而且能够督促医院合理收费,保障患者的利益。
参照图3,图3为本申请基于大数据的异常数据识别方法第二实施例的流程示意图,基于上述图2所示的第一实施例,提出本申请基于大数据的异常数据识别方法的第二实施例。
在第二实施例中,所述步骤S30,包括:
步骤S301:根据所述标准化诊疗数据生成费用总表,所述费用总表包括若干条收费记录,每条收费记录至少包括收费项目。
需要说明的是,为了识别出重复收费数据,将通过预设嵌套循环算法逐条检查套餐收费项目中的收费项目是否再次收费,首先根据所述标准化诊疗数据生成费用总表。
在具体实现中,建立预设费用总表,在所述预设费用总表中设置收费时间、收费项目与收费金额等栏目,从每条标准化诊疗数据中提取出收费时间、收费项目与收费金额等信息,将提取出的信息作为一条收费记录填入所述预设费用总表中,从而生成包含若干条收费记录的费用总表。
步骤S302:根据所述收费项目的类型将所述费用总表拆分为套餐费用表和单项费用表。
可以理解的是,为了通过所述预设嵌套循环算法识别所述标准化诊疗数据中的重复收费数据,将根据所述收费项目的类型将所述费用总表拆分为套餐费用表和单项费用表,对所述费用总表中的每条记录,判断该条记录中的收费项目的类型,当该条记录中的收费项目的类型为套餐收费项目时,将该条记录保存至套餐费用表中,当该条记录中的收费项目的类型为单项收费项目时,将该条记录保存至单项费用表中。
步骤S303:通过预设嵌套循环算法判断所述套餐费用表与所述单项费用表是否包含相同收费项目的收费记录。
步骤S304:若所述套餐费用表与所述单项费用表包含相同收费项目的收费记录,则将包含相同收费项目的收费记录作为重复收费数据。
需要说明的是,当获取所述套餐费用表和所述单项费用表之后,将通过预设嵌套循环算法判断所述套餐费用表与所述单项费用表是否包含相同收费项目的收费记录,若所述套餐费用表与所述单项费用表包含相同收费项目的收费记录,则将所述包含相同收费项目的收费记录作为重复收费数据。
进一步地,所述步骤S303,包括:
通过预设嵌套循环算法遍历所述套餐费用表,选取第一收费记录;
将所述第一收费记录与所述单项费用表中的多条第二收费记录依次进行匹配,根据匹配结果判断所述第一收费记录与所述第二收费记录是否包含相同收费项目。在本实施例中,所述步骤S304,包括:
若所述第一收费记录与所述第二收费记录包含相同收费项目,则将包含相同收费项目的第一收费记录和第二收费记录作为重复收费数据。
可以理解的是,嵌套循环算法中包含驱动表与被驱动表,在本实施例中,将所述套餐费用表作为驱动表,将所述单项费用表作为被驱动表,遍历所述套餐费用表,从所述套餐费用表中选取第一收费记录,将所述第一收费记录与所述单项费用表中的多条第二收费记录依次进行匹配,匹配成功说明所述第一收费记录与所述第二收费记录中包含相同收费项目,匹配失败则说明所述第一收费记录与所述第二收费记录中不包含相同收费项目,从而将包含相同收费项目的第一收费记录和第二收费记录作为重复收费数据。
在具体实现中,例如第一收费记录为腰椎间盘摘除术治疗收费记录,该收费记录包括收费时间、收费项目以及收费金额,其中,收费项目为包含了床位费和护理费、检查和检验费用、药品及耗材费用、医生的诊疗费和手术费的套餐收费项目;第二收费记录包括多条收费记录,各条第二收费记录中的收费项目依次为床位费、心电监测以及吸氧治疗等,当第一收费记录与多条第二收费记录依次进行匹配时,首先将所述腰椎间盘摘除术治疗收费记录与床位费进行匹配,匹配成功,再将所述腰椎间盘摘除术治疗收费记录与心电监测进行匹配,匹配失败,最后将所述腰椎间盘摘除术治疗收费记录与吸氧治疗进行匹配,匹配失败,因此,将包含床位费的第一收费记录与包含床位费的第二收费记录作为重复收费数据。
所述通过预设嵌套循环算法遍历所述套餐费用表,选取第一收费记录之前,所述方法还包括:
将所述套餐费用表与所述单项费用表中的记录数目进行比较;
若所述套餐费用表中的记录数目低于所述单项费用表中的记录数目,则执行所述通过预设嵌套循环算法遍历所述套餐费用表,选取第一收费记录的步骤。
需要说明的是,驱动表中的记录选取次数与驱动表中的记录数目相同,因而,为了提高所述预设嵌套循环算法的实施效率,以表中记录数目较少的表作为驱动表,因此,分别获取所述套餐费用表与所述单项费用表中的记录数目,将所述套餐费用表与所述单项费用表中的记录数目进行比较,若所述套餐费用表中的记录数目低于所述单项费用表中的记录数目,则以所述套餐费用表作为驱动表,将所述单项费用表作为被驱动表,从而执行所述通过预设嵌套循环算法遍历所述套餐费用表,选取第一收费记录的步骤。
在具体实现中,若所述套餐费用表中的记录数目不低于所述单项费用表中的记录数目,则将所述单项费用表作为驱动表,以所述套餐费用表作为被驱动表,遍历所述单项费用表,选取收费记录,将该选取的收费记录与套餐费用表中的收费记录进行匹配,从而识别出重复收费数据,方便高效。
在第二实施例中,根据所述标准化诊疗数据生成费用总表,所述费用总表包括若干条收费记录,每条收费记录至少包括收费项目;根据所述收费项目的类型将所述费用总表拆分为套餐费用表和单项费用表;通过预设嵌套循环算法判断所述套餐费用表与所述单项费用表是否包含相同收费项目的收费记录,若所述套餐费用表与所述单项费用表包含相同收费项目的收费记录,则将所述将包含相同收费项目的收费记录作为重复收费数据。由于将所述套餐费用表与所述单项费用表逐一进行匹配,能够准确地获取所有重复收费数据,能够督促医院合理收费,保障患者的利益。
参照图4,图4为本申请基于大数据的异常数据识别方法第三实施例的流程示意图,基于上述图3所示的第二实施例,提出本申请基于大数据的异常数据识别方法的第三实施例。
在第二实施例中,所述步骤S20,包括:
步骤S201:对所述诊疗数据进行分词处理,生成词语序列。
步骤S202:将所述词语序列中的词语转化为词向量,生成对应的词向量序列。
可以理解的是,为了实现对所述诊疗数据的标准化,需将所述诊疗数据转化为计算机可以识别的标准化诊疗数据,比如向量,在本实施例中,对所述诊疗数据进行分词处理,生成词语序列,所述词语序列包含所述诊疗数据的每个词语与词语的序列。将所述词语序列中的词语转化为词向量,结合所述词语的序列,可获得词向量序列,所述词向量包含所述诊疗数据的词向量与词向量的序列。
步骤S203:根据预设双向递归神经网络模型将所述词向量序列编码为句子矩阵。
需要说明的是,所述预设双向递归神经网络(Bidirectional recurrent neural network,BRNN)模型是一种具有反馈结构的神经网络模型,将所述词向量输入至所述预设双向递归神经网络模型中,以使所述预设双向递归神经网络模型对所述词向量序列进行编码,并输出句子矩阵,所述句子矩阵的每一行表示每个词语在上下文中所表达的意思。
步骤S204:通过预设注意力模型将所述句子矩阵压缩为句向量,并将所述句向量作为标准化诊疗数据。
可以理解的是,注意力模型(Attention Model)用于从众多信息中选择出对当前任务目标更关键的信息,而所述预设注意力模型用于从所述句子矩阵中提取有效数据,并将所述有效数据转化为句向量。
进一步地,在第三实施例中,所述步骤S203,包括:
将所述词向量序列依次先正向后反向输入到预设双向递归神经网络模型中,以使所述预设双向递归神经网络模型对所述词向量序列进行编码,并输出句子矩阵。
需要说明的是,将所述词向量序列依次正向和反向输入到所述预设双向递归神经网络模型中,其中,正向输入是指将所述词向量序列中的词向量,按照位置的前后顺序依次输入对应时刻的预设双向递归神经网络模型中,所述反向输入是指将所述词向量序列中的词向量倒序依次输入对应时刻的预设双向递归神经网络模型,所述预设双向递归神经网络模型每个当前时刻的输入信号还包括上一时刻所述预设双向递归神经网络模型的输出信号,正向和反向信息输入都结束后,停止递归,输出句子矩阵。
进一步地,在第三实施例中,所述步骤S204,包括:
通过预设注意力模型从所述句子矩阵中提取上下文向量;
根据所述上下文向量将所述句子矩阵压缩为句向量,并将所述句向量作为标准化诊疗数据。
可以理解的是,所述上下文向量表达了词向量之间的上下文关系,通过所述预设注意力模型从所述句子矩阵中提取上下文向量,根据所述上下文向量将所述句子矩阵压缩为句向量,能够提高句向量的准确性与全面性,从而获得准确的标准化诊疗数据。
在第三实施例中,对所述诊疗数据进行分词处理,生成词语序列,将所述词语序列中的词语转化为词向量,生成对应的词向量序列,根据预设双向递归神经网络模型将所述词向量序列编码为句子矩阵,通过预设注意力模型将所述句子矩阵压缩为句向量,并将所述句向量作为标准化诊疗数据。由于依赖上下文向量,提高了生成标准化诊疗数据的效率和准确率。
此外,本申请实施例还提出一种存储介质,所述存储介质可以为非易失性可读存储介质。所述存储介质上存储有基于大数据的异常数据识别计算机可读指令,所述基于大数据的异常数据识别计算机可读指令被处理器执行时实现如上文所述的基于大数据的异常数据识别方法的步骤。其中,该计算机可读指令被执行时所实现的方法可参照本申请基于大数据的异常数据识别方法的各个实施例,此处不再赘述。
此外,参照图5,本申请实施例还提出一种基于大数据的异常数据识别装置,所述基于大数据的异常数据识别装置包括:
获取模块10,用于获取患者的诊疗数据;
处理模块20,用于对所述诊疗数据进行标准化处理,获得标准化诊疗数据;
挖掘模块30,用于通过预设嵌套循环算法对所述标准化诊疗数据进行异常挖掘,以识别所述标准化诊疗数据中的重复收费数据。
在一实施例中,所述挖掘模块30,还用于根据所述标准化诊疗数据生成费用总表,所述费用总表包括若干条收费记录,每条收费记录至少包括收费项目;根据所述收费项目的类型将所述费用总表拆分为套餐费用表和单项费用表;通过预设嵌套循环算法判断所述套餐费用表与所述单项费用表是否包含相同收费项目的收费记录,将包含相同收费项目的收费记录作为重复收费数据。
在一实施例中,所述挖掘模块30,还用于通过预设嵌套循环算法遍历所述套餐费用表,选取第一收费记录;将所述第一收费记录与所述单项费用表中的多条第二收费记录依次进行匹配,根据匹配结果判断所述第一收费记录与所述第二收费记录是否包含相同收费项目;将包含相同收费项目的第一收费记录和第二收费记录作为重复收费数据。
在一实施例中,所述挖掘模块30,还用于将所述套餐费用表与所述单项费用表中的记录数目进行比较;当所述套餐费用表中的记录数目低于所述单项费用表中的记录数目时,执行所述通过预设嵌套循环算法遍历所述套餐费用表,选取第一收费记录的步骤。
在一实施例中,所述处理模块20,还用于对所述诊疗数据进行分词处理,生成词语序列;将所述词语序列中的词语转化为词向量,生成对应的词向量序列;根据预设双向递归神经网络模型将所述词向量序列编码为句子矩阵;通过预设注意力模型将所述句子矩阵压缩为句向量,并将所述句向量作为标准化诊疗数据。
在一实施例中,所述处理模块20,还用于将所述词向量序列依次先正向后反向输入到预设双向递归神经网络模型中,以使所述预设双向递归神经网络模型对所述词向量序列进行编码,并输出句子矩阵。
在一实施例中,所述处理模块20,还用于通过预设注意力模型从所述句子矩阵中提取上下文向量;根据所述上下文向量将所述句子矩阵压缩为句向量,并将所述句向量作为标准化诊疗数据。
本申请所述基于大数据的异常数据识别装置的其他实施例或具体实现方式可参照上述各方法实施例,此处不再赘述。
以上仅为本申请的优选实施例,并非因此限制本申请的专利范围,凡是利用本申请说明书及附图内容所作的等效结构或等效流程变换,或直接或间接运用在其他相关的技术领域,均同理包括在本申请的专利保护范围内。

Claims (20)

  1. 一种基于大数据的异常数据识别方法,其特征在于,所述基于大数据的异常数据识别方法包括以下步骤:
    获取患者的诊疗数据;
    对所述诊疗数据进行标准化处理,获得标准化诊疗数据;
    通过预设嵌套循环算法对所述标准化诊疗数据进行异常挖掘,以识别所述标准化诊疗数据中的重复收费数据。
  2. 如权利要求1所述的基于大数据的异常数据识别方法,其特征在于,所述通过预设嵌套循环算法对所述标准化诊疗数据进行异常挖掘,以识别所述标准化诊疗数据中的重复收费数据,包括:
    根据所述标准化诊疗数据生成费用总表,所述费用总表包括若干条收费记录,每条收费记录至少包括收费项目;
    根据所述收费项目的类型将所述费用总表拆分为套餐费用表和单项费用表;
    通过预设嵌套循环算法判断所述套餐费用表与所述单项费用表是否包含相同收费项目的收费记录;
    若所述套餐费用表与所述单项费用表包含相同收费项目的收费记录,则将所述包含相同收费项目的收费记录作为重复收费数据。
  3. 如权利要求2所述的基于大数据的异常数据识别方法,其特征在于,所述通过预设嵌套循环算法判断所述套餐费用表与所述单项费用表是否包含相同收费项目的收费记录,包括:
    通过预设嵌套循环算法遍历所述套餐费用表,选取第一收费记录;
    将所述第一收费记录与所述单项费用表中的多条第二收费记录依次进行匹配,根据匹配结果判断所述第一收费记录与所述第二收费记录是否包含相同收费项目;
    所述若所述套餐费用表与所述单项费用表包含相同收费项目的收费记录,则将所述包含相同收费项目的收费记录作为重复收费数据,包括:
    若所述第一收费记录与所述第二收费记录包含相同收费项目,则将包含相同收费项目的第一收费记录和第二收费记录作为重复收费数据。
  4. 如权利要求3所述的基于大数据的异常数据识别方法,其特征在于,所述通过预设嵌套循环算法遍历所述套餐费用表,选取第一收费记录之前,所述方法还包括:
    将所述套餐费用表与所述单项费用表中的记录数目进行比较;
    若所述套餐费用表中的记录数目低于所述单项费用表中的记录数目,则执行所述通过预设嵌套循环算法遍历所述套餐费用表,选取第一收费记录的步骤。
  5. 如权利要求1所述的基于大数据的异常数据识别方法,其特征在于,所述对所述诊疗数据进行标准化处理,获得标准化诊疗数据,包括:
    对所述诊疗数据进行分词处理,生成词语序列;
    将所述词语序列中的词语转化为词向量,生成对应的词向量序列;
    根据预设双向递归神经网络模型将所述词向量序列编码为句子矩阵;
    通过预设注意力模型将所述句子矩阵压缩为句向量,并将所述句向量作为标准化诊疗数据。
  6. 如权利要求5所述的基于大数据的异常数据识别方法,其特征在于,所述根据预设双向递归神经网络模型将所述词向量序列编码为句子矩阵,包括:
    将所述词向量序列依次先正向后反向输入到预设双向递归神经网络模型中,以使所述预设双向递归神经网络模型对所述词向量序列进行编码,并输出句子矩阵。
  7. 如权利要求6所述的基于大数据的异常数据识别方法,其特征在于,所述通过预设注意力模型将所述句子矩阵压缩为句向量,并将所述句向量作为标准化诊疗数据,包括:
    通过预设注意力模型从所述句子矩阵中提取上下文向量;
    根据所述上下文向量将所述句子矩阵压缩为句向量,并将所述句向量作为标准化诊疗数据。
  8. 一种基于大数据的异常数据识别设备,其特征在于,所述基于大数据的异常数据识别设备包括:存储器、处理器及存储在所述存储器上并可在所述处理器上运行的基于大数据的异常数据识别计算机可读指令,所述基于大数据的异常数据识别计算机可读指令被所述处理器执行时实现如下步骤:
    获取患者的诊疗数据;
    对所述诊疗数据进行标准化处理,获得标准化诊疗数据;
    通过预设嵌套循环算法对所述标准化诊疗数据进行异常挖掘,以识别所述标准化诊疗数据中的重复收费数据。
  9. 如权利要求8所述的基于大数据的异常数据识别设备,其特征在于,所述通过预设嵌套循环算法对所述标准化诊疗数据进行异常挖掘,以识别所述标准化诊疗数据中的重复收费数据,包括:
    根据所述标准化诊疗数据生成费用总表,所述费用总表包括若干条收费记录,每条收费记录至少包括收费项目;
    根据所述收费项目的类型将所述费用总表拆分为套餐费用表和单项费用表;
    通过预设嵌套循环算法判断所述套餐费用表与所述单项费用表是否包含相同收费项目的收费记录;
    若所述套餐费用表与所述单项费用表包含相同收费项目的收费记录,则将所述包含相同收费项目的收费记录作为重复收费数据。
  10. 如权利要求9所述的基于大数据的异常数据识别设备,其特征在于,所述通过预设嵌套循环算法判断所述套餐费用表与所述单项费用表是否包含相同收费项目的收费记录,包括:
    通过预设嵌套循环算法遍历所述套餐费用表,选取第一收费记录;
    将所述第一收费记录与所述单项费用表中的多条第二收费记录依次进行匹配,根据匹配结果判断所述第一收费记录与所述第二收费记录是否包含相同收费项目;
    所述若所述套餐费用表与所述单项费用表包含相同收费项目的收费记录,则将所述包含相同收费项目的收费记录作为重复收费数据,包括:
    若所述第一收费记录与所述第二收费记录包含相同收费项目,则将包含相同收费项目的第一收费记录和第二收费记录作为重复收费数据。
  11. 如权利要求10所述的基于大数据的异常数据识别设备,其特征在于,所述通过预设嵌套循环算法遍历所述套餐费用表,选取第一收费记录之前,还包括:
    将所述套餐费用表与所述单项费用表中的记录数目进行比较;
    若所述套餐费用表中的记录数目低于所述单项费用表中的记录数目,则执行所述通过预设嵌套循环算法遍历所述套餐费用表,选取第一收费记录的步骤。
  12. 如权利要求8所述的基于大数据的异常数据识别设备,其特征在于,所述对所述诊疗数据进行标准化处理,获得标准化诊疗数据,包括:
    对所述诊疗数据进行分词处理,生成词语序列;
    将所述词语序列中的词语转化为词向量,生成对应的词向量序列;
    根据预设双向递归神经网络模型将所述词向量序列编码为句子矩阵;
    通过预设注意力模型将所述句子矩阵压缩为句向量,并将所述句向量作为标准化诊疗数据。
  13. 一种基于大数据的异常数据识别装置,其特征在于,所述基于大数据的异常数据识别装置包括:
    获取模块,用于获取患者的诊疗数据;
    处理模块,用于对所述诊疗数据进行标准化处理,获得标准化诊疗数据;
    挖掘模块,用于通过预设嵌套循环算法对所述标准化诊疗数据进行异常挖掘,以识别所述标准化诊疗数据中的重复收费数据。
  14. 如权利要求13所述的基于大数据的异常数据识别装置,其特征在于,所述挖掘模块,还用于根据所述标准化诊疗数据生成费用总表,所述费用总表包括若干条收费记录,每条收费记录至少包括收费项目;
    根据所述收费项目的类型将所述费用总表拆分为套餐费用表和单项费用表;
    通过预设嵌套循环算法判断所述套餐费用表与所述单项费用表是否包含相同收费项目的收费记录;
    若所述套餐费用表与所述单项费用表包含相同收费项目的收费记录,则将所述包含相同收费项目的收费记录作为重复收费数据。
  15. 如权利要求14所述的基于大数据的异常数据识别装置,其特征在于,所述挖掘模块,还用于通过预设嵌套循环算法遍历所述套餐费用表,选取第一收费记录;
    将所述第一收费记录与所述单项费用表中的多条第二收费记录依次进行匹配,根据匹配结果判断所述第一收费记录与所述第二收费记录是否包含相同收费项目;
    若所述第一收费记录与所述第二收费记录包含相同收费项目,则将包含相同收费项目的第一收费记录和第二收费记录作为重复收费数据。
  16. 如权利要求15所述的基于大数据的异常数据识别装置,其特征在于,所述挖掘模块,还用于将所述套餐费用表与所述单项费用表中的记录数目进行比较;
    若所述套餐费用表中的记录数目低于所述单项费用表中的记录数目,则执行所述通过预设嵌套循环算法遍历所述套餐费用表,选取第一收费记录的步骤。
  17. 如权利要求13所述的基于大数据的异常数据识别装置,其特征在于,所述处理模块,还用于对所述诊疗数据进行分词处理,生成词语序列;
    将所述词语序列中的词语转化为词向量,生成对应的词向量序列;
    根据预设双向递归神经网络模型将所述词向量序列编码为句子矩阵;
    通过预设注意力模型将所述句子矩阵压缩为句向量,并将所述句向量作为标准化诊疗数据。
  18. 一种存储介质,其特征在于,所述存储介质上存储有基于大数据的异常数据识别计算机可读指令,所述基于大数据的异常数据识别计算机可读指令被处理器执行时实现如下步骤:
    获取患者的诊疗数据;
    对所述诊疗数据进行标准化处理,获得标准化诊疗数据;
    通过预设嵌套循环算法对所述标准化诊疗数据进行异常挖掘,以识别所述标准化诊疗数据中的重复收费数据。
  19. 如权利要求18所述的存储介质,其特征在于,所述通过预设嵌套循环算法对所述标准化诊疗数据进行异常挖掘,以识别所述标准化诊疗数据中的重复收费数据,包括:
    根据所述标准化诊疗数据生成费用总表,所述费用总表包括若干条收费记录,每条收费记录至少包括收费项目;
    根据所述收费项目的类型将所述费用总表拆分为套餐费用表和单项费用表;
    通过预设嵌套循环算法判断所述套餐费用表与所述单项费用表是否包含相同收费项目的收费记录;
    若所述套餐费用表与所述单项费用表包含相同收费项目的收费记录,则将所述包含相同收费项目的收费记录作为重复收费数据。
  20. 如权利要求19所述的存储介质,其特征在于,所述通过预设嵌套循环算法判断所述套餐费用表与所述单项费用表是否包含相同收费项目的收费记录,包括:
    通过预设嵌套循环算法遍历所述套餐费用表,选取第一收费记录;
    将所述第一收费记录与所述单项费用表中的多条第二收费记录依次进行匹配,根据匹配结果判断所述第一收费记录与所述第二收费记录是否包含相同收费项目;
    所述若所述套餐费用表与所述单项费用表包含相同收费项目的收费记录,则将所述包含相同收费项目的收费记录作为重复收费数据,包括:
    若所述第一收费记录与所述第二收费记录包含相同收费项目,则将包含相同收费项目的第一收费记录和第二收费记录作为重复收费数据。
PCT/CN2019/118839 2018-12-13 2019-11-15 基于大数据的异常数据识别方法、设备、存储介质及装置 WO2020119386A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201811530843.7A CN109615377A (zh) 2018-12-13 2018-12-13 基于大数据的重复收费识别方法、设备、存储介质及装置
CN201811530843.7 2018-12-13

Publications (1)

Publication Number Publication Date
WO2020119386A1 true WO2020119386A1 (zh) 2020-06-18

Family

ID=66009400

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/118839 WO2020119386A1 (zh) 2018-12-13 2019-11-15 基于大数据的异常数据识别方法、设备、存储介质及装置

Country Status (2)

Country Link
CN (1) CN109615377A (zh)
WO (1) WO2020119386A1 (zh)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109615377A (zh) * 2018-12-13 2019-04-12 平安医疗健康管理股份有限公司 基于大数据的重复收费识别方法、设备、存储介质及装置
CN113657516A (zh) * 2021-08-20 2021-11-16 泰康保险集团股份有限公司 医疗交易数据处理的方法、装置、电子设备和存储介质

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107145587A (zh) * 2017-05-11 2017-09-08 成都四方伟业软件股份有限公司 一种基于大数据挖掘的医保反欺诈系统
CN107833595A (zh) * 2017-10-12 2018-03-23 山东大学 医疗大数据多中心整合平台及方法
CN108182963A (zh) * 2017-12-14 2018-06-19 山东浪潮云服务信息科技有限公司 一种医疗数据处理方法及装置
CN109615377A (zh) * 2018-12-13 2019-04-12 平安医疗健康管理股份有限公司 基于大数据的重复收费识别方法、设备、存储介质及装置

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102074069A (zh) * 2011-01-13 2011-05-25 易联众信息技术股份有限公司 一种门诊诊疗费用自助结算方法
CN102169491B (zh) * 2011-03-25 2012-11-21 暨南大学 一种多数据集中重复记录动态检测方法
CN106228023B (zh) * 2016-08-01 2018-08-28 清华大学 一种基于本体和主题模型的临床路径挖掘方法
CN106326642A (zh) * 2016-08-16 2017-01-11 成都中医药大学 基于大数据分析建立医疗诊费点阵模型的方法
CN108566627A (zh) * 2017-11-27 2018-09-21 浙江鹏信信息科技股份有限公司 一种利用深度学习识别诈骗短信的方法及系统

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107145587A (zh) * 2017-05-11 2017-09-08 成都四方伟业软件股份有限公司 一种基于大数据挖掘的医保反欺诈系统
CN107833595A (zh) * 2017-10-12 2018-03-23 山东大学 医疗大数据多中心整合平台及方法
CN108182963A (zh) * 2017-12-14 2018-06-19 山东浪潮云服务信息科技有限公司 一种医疗数据处理方法及装置
CN109615377A (zh) * 2018-12-13 2019-04-12 平安医疗健康管理股份有限公司 基于大数据的重复收费识别方法、设备、存储介质及装置

Also Published As

Publication number Publication date
CN109615377A (zh) 2019-04-12

Similar Documents

Publication Publication Date Title
WO2020119176A1 (zh) 报销数据的排查方法、识别服务端及存储介质
WO2020119403A1 (zh) 住院数据异常检测方法、装置、设备及可读存储介质
WO2020119385A1 (zh) 基于数据分析的处方生成监测方法、装置、设备和介质
WO2018205373A1 (zh) 人伤理赔定损费用测算方法、装置、服务器和介质
WO2020034526A1 (zh) 保险录音的质检方法、装置、设备和计算机存储介质
WO2020119386A1 (zh) 基于大数据的异常数据识别方法、设备、存储介质及装置
WO2020164267A1 (zh) 文本分类模型构建方法、装置、终端及存储介质
WO2016112558A1 (zh) 智能交互系统中的问题匹配方法和系统
WO2020062658A1 (zh) 合同生成方法、装置、设备及存储介质
WO2020119384A1 (zh) 基于大数据分析的医保异常检测方法、装置、设备和介质
WO2020087981A1 (zh) 风控审核模型生成方法、装置、设备及可读存储介质
WO2020119115A1 (zh) 数据审核方法、装置、设备及存储介质
WO2020107895A1 (zh) 慢病检查报告生成方法、装置、设备及存储介质
WO2020119131A1 (zh) 用药方案异常的识别方法、装置、终端及可读存储介质
WO2020108111A1 (zh) 医保欺诈行为的识别方法、装置、设备及可读存储介质
WO2016165177A1 (zh) 基于面部识别的网络医院自助缴费方法和系统
WO2020253116A1 (zh) 数据跑批方法、装置、存储介质及集群中的成员主机
WO2016179880A1 (zh) 网络医院平台、专家平台及基于专家平台的紧急专家会诊请求方法
WO2016179885A1 (zh) 用户终端、专家平台及基于用户终端的专家会诊请求方法
WO2016165201A1 (zh) 基于网络医院的自助缴费方法和系统
WO2020119402A1 (zh) 无关用药的识别方法、装置、终端及计算机可读存储介质
WO2022145782A2 (ko) 빅데이터 및 클라우드 시스템 기반 인공지능 응급의료 의사결정 및 응급환자 이송 시스템과 그 방법
WO2020119401A1 (zh) 医师违规执业的监控方法、监控服务端及存储介质
WO2020119118A1 (zh) 异常数据的处理方法、装置、设备及计算机可读存储介质
WO2020062641A1 (zh) 识别用户角色的方法、用户设备、存储介质及装置

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19894698

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19894698

Country of ref document: EP

Kind code of ref document: A1

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 15/10/2021)