CN111739648A - Data anomaly detection method and device, electronic equipment and storage medium - Google Patents

Data anomaly detection method and device, electronic equipment and storage medium Download PDF

Info

Publication number
CN111739648A
CN111739648A CN202010591658.XA CN202010591658A CN111739648A CN 111739648 A CN111739648 A CN 111739648A CN 202010591658 A CN202010591658 A CN 202010591658A CN 111739648 A CN111739648 A CN 111739648A
Authority
CN
China
Prior art keywords
medical
value range
sampling
determining
numerical
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010591658.XA
Other languages
Chinese (zh)
Inventor
吴莹
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Ping An Medical Health Technology Service Co Ltd
Original Assignee
Ping An Medical and Healthcare Management Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ping An Medical and Healthcare Management Co Ltd filed Critical Ping An Medical and Healthcare Management Co Ltd
Priority to CN202010591658.XA priority Critical patent/CN111739648A/en
Publication of CN111739648A publication Critical patent/CN111739648A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/70ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for mining of medical data, e.g. analysing previous cases of other patients
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • G06Q40/08Insurance

Abstract

The invention relates to the technical field of artificial intelligence, and provides a data anomaly detection method, which comprises the following steps: receiving a medical data sample; extracting medical numerical characteristics from a medical data sample; determining an interval of the medical numerical type characteristics by adopting an interval estimation method, and determining two endpoints of the interval as a first value range; determining a sampling value range obtained by sampling the medical numerical type characteristics each time by adopting a random sampling method, and determining a second value range according to the sampling value range each time; weighting the first value range and the second value range to determine a learned estimated value range; when medical insurance data are received, extracting numerical characteristics to be detected from the medical insurance data; judging whether the numerical characteristic to be detected exceeds the range of the estimated value range; and if so, determining that the medical insurance data is abnormal. The invention also relates to a block chain technology, which can upload medical insurance data to the block chain. The method can be applied to intelligent medical scenes, so that the construction of an intelligent city is promoted.

Description

Data anomaly detection method and device, electronic equipment and storage medium
Technical Field
The invention relates to the technical field of artificial intelligence, in particular to a data anomaly detection method and device, electronic equipment and a storage medium.
Background
With the increasing social and economic level, people pay more and more attention to the requirements of health, and participate in medical insurance in a dispute, so that medical insurance data is more and more. However, due to the fact that the types of medical insurance fraud and unreasonable medical behaviors vary, some unsatisfactory data inevitably appear in a large amount of medical insurance data, for example, some people inhale oxygen for more than 24 hours a day, and the dosage of medicines of other people seriously exceeds the dosage which can be actually used, so that medical resources are seriously wasted, and meanwhile, certain medical risks also exist.
Therefore, how to identify the abnormality of the medical data is an urgent technical problem to be solved.
Disclosure of Invention
In view of the above, it is desirable to provide a data abnormality detection method, apparatus, electronic device and storage medium, which can perform abnormality identification on medical data.
A first aspect of the present invention provides a data anomaly detection method, including:
receiving an input medical data sample;
extracting medical numerical features from the medical data sample;
determining an interval of the medical numerical type characteristic by adopting an interval estimation method, and determining two endpoints of the interval as a first value range;
determining a sampling value range obtained by sampling the medical numerical type characteristics each time by adopting a random sampling method, and determining a second value range according to the sampling value range each time;
weighting the first value range and the second value range to determine a learned estimated value range;
when medical insurance data needing abnormal detection is received, extracting numerical characteristics to be detected from the medical insurance data;
judging whether the numerical characteristic to be detected exceeds the estimation value range or not;
and if the numerical characteristic to be detected exceeds the estimation value range, determining that the medical insurance data is abnormal.
In one possible implementation, the extracting the medical numerical feature from the medical data sample includes:
extracting from the medical data sample an original medical numerical-type feature comprised by the medical data sample; or
Determining a derived data sample from the medical data sample and extracting derived medical numerical features from the derived data sample.
In a possible implementation manner, the determining the interval of the medical numerical feature by using an interval estimation method includes:
calculating the mean value of the medical numerical type characteristics by adopting an interval estimation method;
calculating a sampling error of the medical numerical-type feature;
and determining the interval of the medical numerical characteristic according to the average value and the sampling error by adopting an interval estimation method.
In a possible implementation manner, the determining, by using a random sampling method, a sampling range obtained by sampling the medical numerical feature each time includes:
determining the total length of the medical numerical type features, numbering the medical numerical type features according to the total length, and obtaining a plurality of continuous numbers;
repeatedly sampling the plurality of continuous numbers by adopting a replaced random sampling method to obtain a plurality of groups of number sampling values, wherein the plurality of groups of number sampling values comprise the plurality of continuous numbers;
and determining a sampling value range of the medical numerical type characteristic corresponding to the number sampling value for each group of the number sampling values.
In a possible implementation manner, the repeatedly sampling the plurality of consecutive numbers by using the replaced random sampling method, and obtaining a plurality of groups of number sampling values includes:
sampling the plurality of continuous numbers by adopting a replaced random sampling method to obtain a first number sampling value;
judging whether the first number sampling value contains all the plurality of continuous numbers;
if the first serial number sampling value does not contain all the serial numbers, repeatedly sampling the serial numbers to obtain a second serial number sampling value;
judging whether the first number sampling value and the second number sampling value contain all the plurality of continuous numbers or not;
and if the first number sampling value and the second number sampling value contain all the plurality of continuous numbers, determining that the first number sampling value and the second number sampling value are a plurality of groups of number sampling values.
In a possible implementation manner, the weighting the first value range and the second value range, and determining the learned estimated value range includes:
acquiring a first error distribution corresponding to the first value range and acquiring a second error distribution corresponding to the second value range;
determining a first weight parameter of the first value range and a second weight parameter of the second value range according to the first error distribution and the second error distribution;
and weighting the first value range and the second value range according to the first weight parameter and the second weight parameter to obtain a learned estimated value range.
In a possible implementation manner, the data anomaly detection method further includes:
and uploading the abnormal medical insurance data to a block chain.
A second aspect of the present invention provides a data abnormality detection device, comprising:
a receiving module for receiving an input medical data sample;
an extraction module for extracting medical numerical features from the medical data sample;
the first determination module is used for determining an interval of the medical numerical type characteristic by adopting an interval estimation method and determining two endpoints of the interval as a first value range;
the second determining module is used for determining a sampling value range obtained by sampling the medical numerical type feature each time by adopting a random sampling method, and determining a second value range according to the sampling value range each time;
a third determining module, configured to weight the first value range and the second value range, and determine a learned estimated value range;
the extraction module is further used for extracting numerical characteristics to be detected from the medical insurance data when the medical insurance data needing to be subjected to the abnormal detection is received;
the judging module is used for judging whether the numerical characteristic to be detected exceeds the range of the estimated value range;
and the fourth determining module is used for determining that the medical insurance data is abnormal if the numerical characteristic to be detected exceeds the estimation value range.
A third aspect of the invention provides an electronic device comprising a processor and a memory, the processor being configured to implement the data anomaly detection method when executing a computer program stored in the memory.
A fourth aspect of the present invention provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the data anomaly detection method.
In the technical scheme, two value range ranges of the medical features are determined by an interval estimation method and a random sampling method, and the two value range ranges are weighted according to the weighting parameters to obtain an estimation value range, wherein the estimation value range can be used for judging the abnormality of the medical data. And when a medical examination receipt needing abnormality detection is received subsequently, the estimation value range can be adopted for judgment. In the invention, compared with any value range, the estimation value range is more accurate in abnormality judgment of the medical data and has better abnormality identification effect.
Drawings
FIG. 1 is a flow chart of a data anomaly detection method according to a preferred embodiment of the present invention.
FIG. 2 is a functional block diagram of a data anomaly detection apparatus according to a preferred embodiment of the present invention.
Fig. 3 is a schematic structural diagram of an electronic device implementing a data anomaly detection method according to a preferred embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The terms "first" and "second" in the description and claims of the present application and the above-described drawings are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It will be appreciated that the data so used may be interchanged under appropriate circumstances such that the embodiments described herein may be practiced otherwise than as specifically illustrated or described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
It should be noted that the descriptions relating to "first", "second", "third", "fourth", etc. in the present invention are for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first," "second," "third," or "fourth" may explicitly or implicitly include at least one of the feature. In addition, technical solutions between various embodiments may be combined with each other, but must be realized by a person skilled in the art, and when the technical solutions are contradictory or cannot be realized, such a combination should not be considered to exist, and is not within the protection scope of the present invention.
The electronic device is a device capable of automatically performing numerical calculation and/or information processing according to preset or stored instructions, and the hardware thereof includes, but is not limited to, a microprocessor, an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA), a Digital Signal Processor (DSP), an embedded device, and the like. The electronic device may also include a network device and/or a user device. The network device includes, but is not limited to, a single network server, a server group consisting of a plurality of network servers, or a Cloud Computing (Cloud Computing) based Cloud consisting of a large number of hosts or network servers. The user device includes, but is not limited to, any electronic product that can interact with a user through a keyboard, a mouse, a remote controller, a touch pad, or a voice control device, for example, a personal computer, a tablet computer, a smart phone, a Personal Digital Assistant (PDA), or the like.
Referring to fig. 1, fig. 1 is a flowchart illustrating a data anomaly detection method according to a preferred embodiment of the present invention. The order of the steps in the flowchart may be changed, and some steps may be omitted.
And S11, receiving the input medical data sample.
And S12, extracting medical numerical type features from the medical data samples.
Wherein the medical numerical characteristic is a numerical type of medical data.
Specifically, the extracting the medical numerical characteristic from the medical data sample includes:
extracting from the medical data sample an original medical numerical-type feature comprised by the medical data sample; or
Determining a derived data sample from the medical data sample and extracting derived medical numerical features from the derived data sample.
Optionally, it may be determined whether the medical data sample includes a number, and if the medical data sample includes a number, extracting a medical numerical feature including a number, for example: the duration of the medicine for continuous treatment is 3 hours, and the original 3 hours can be extracted as medical numerical characteristics;
optionally, if the medical data sample does not contain a number, searching a related derivative data sample from the medical data sample, and extracting a medical numerical feature including a number from the derivative data sample, for example: a certain drug is adopted in a treatment scheme in a certain medical diagnostic book, but the drug adopts a default dosage, a derivative data sample of the default dosage of the drug can be found according to the medical diagnostic book, and the value of the default dosage is extracted as a derivative medical numerical characteristic.
S13, determining the interval of the medical numerical characteristic by adopting an interval estimation method, and determining two end points of the interval as a first value range.
Specifically, the determining the interval of the medical numerical characteristic by using an interval estimation method includes:
calculating the mean value of the medical numerical type characteristics by adopting an interval estimation method;
calculating a sampling error of the medical numerical-type feature;
and determining the interval of the medical numerical characteristic according to the average value and the sampling error by adopting an interval estimation method.
Wherein, interval estimation (interval estimate) is based on point estimation, and gives an interval range of the overall parameter estimation, and the interval is usually obtained by adding and subtracting estimation error from sample statistic.
Wherein the interval may be expressed as (μ -Z α/2 σ, μ + Z α/2 σ), where μ is the mean, ± Z α/2 σ is the sampling error, α is the area covered by the non-confidence level within the normal distribution, Z α/2 is the corresponding standard score, and σ is the standard deviation. Where α is typically determined by the traffic and can be set to any number, such as 0.1, 0.5.
Wherein, through the statistics of historical data, the sampling error of 100 samples (for example, the injection amount of a certain medicine of a certain disease) can be determined to be +/-10%; the sampling error of 500 samples (for example, the injection amount of a certain medicine for a certain disease) is +/-5 percent; the sampling error is + -3% for 1200 or more samples (e.g., for the amount of a drug for a disease injected).
S14, determining a sampling value range obtained by sampling the medical numerical type feature each time by adopting a random sampling method, and determining a second value range according to the sampling value range each time.
Specifically, the determining, by using a random sampling method, a sampling value range obtained by sampling the medical numerical characteristic each time includes:
determining the total length of the medical numerical type features, numbering the medical numerical type features according to the total length, and obtaining a plurality of continuous numbers;
repeatedly sampling the plurality of continuous numbers by adopting a replaced random sampling method to obtain a plurality of groups of number sampling values, wherein the plurality of groups of number sampling values comprise the plurality of continuous numbers;
and determining a sampling value range of the medical numerical type characteristic corresponding to the number sampling value for each group of the number sampling values.
Wherein the total length of the medical numerical features (e.g. the total number of injection quantity features of 500 certain medicines) can be calculated and numbered with consecutive integers, such as 1-500 consecutive integers. The replaced random sampling method can ensure that the number of samples sampled each time is the number of original samples, and ensure the randomness of sampling. Each sampling can obtain a group of sampling number values, a column can be designed to record the sampling number values, and the sampling is repeated until all the numbers are acquired, namely, the finally obtained multiple groups of sampling number values comprise all the continuous numbers. Each time sampling is carried out, a group of medical numerical characteristics can be determined, and the maximum value and the minimum value in the group of medical numerical characteristics can be determined, namely, the maximum value and the minimum value can be determined as the sampling value range of the group.
Specifically, the repeatedly sampling the plurality of consecutive numbers by using the replaced random sampling method to obtain a plurality of groups of number sampling values includes:
sampling the plurality of continuous numbers by adopting a replaced random sampling method to obtain a first number sampling value;
judging whether the first number sampling value contains all the plurality of continuous numbers;
if the first serial number sampling value does not contain all the serial numbers, repeatedly sampling the serial numbers to obtain a second serial number sampling value;
judging whether the first number sampling value and the second number sampling value contain all the plurality of continuous numbers or not;
and if the first number sampling value and the second number sampling value contain all the plurality of continuous numbers, determining that the first number sampling value and the second number sampling value are a plurality of groups of number sampling values.
After a group of number sampling values are obtained by sampling each time, whether the sampled numbers contain all numbers needs to be judged, and if not, the sampling needs to be repeated until all the numbers are sampled. The "first" and "second" are merely examples, and represent repeated sampling, and are not limited to two times of sampling, and may be three times of sampling, four times of sampling, or even more times of sampling, and the embodiment of the present invention is not limited thereto.
And S15, weighting the first value range and the second value range, and determining the learned estimated value range.
Specifically, the weighting the first value range and the second value range, and determining the learned estimated value range includes:
acquiring a first error distribution corresponding to the first value range and acquiring a second error distribution corresponding to the second value range;
determining a first weight parameter of the first value range and a second weight parameter of the second value range according to the first error distribution and the second error distribution;
and weighting the first value range and the second value range according to the first weight parameter and the second weight parameter to obtain a learned estimated value range.
The weight parameters of different value range ranges can be set according to the error distribution condition, generally, the higher the error distribution is, the smaller the corresponding weight parameter is, and the lower the error distribution is, the larger the corresponding weight parameter is.
According to the method, the two value range ranges are determined through an interval estimation method and a random sampling method, the two value range ranges are weighted according to the weighting parameters, and the estimation value range is obtained, and compared with any one value range, the estimation value range is more accurate in abnormality judgment of medical data and better in effect.
And S16, when medical insurance data needing to be subjected to abnormality detection is received, extracting numerical characteristics to be detected from the medical insurance data.
The numerical characteristic to be detected can be various medical characteristics including numbers, such as 5 hours of oxygen inhalation and 10ml of medicine dosage.
And S17, judging whether the numerical characteristic to be detected exceeds the range of the estimated value range.
And S18, if the numerical characteristic to be detected exceeds the estimation value range, determining that the medical insurance data is abnormal.
Wherein the features within the estimation value range belong to normal features, and the features outside the estimation value range belong to abnormal features. When the numerical characteristic to be detected exceeds the estimation value range, it may be determined that the medical insurance data is abnormal, such as: the oxygen inhalation time is more than 24 hours, and the dosage of certain medicine is beyond the normal range of the estimated value range. And when the numerical characteristic to be detected is judged to be in the estimation value range, the medical insurance data can be determined to be normal.
Optionally, the data anomaly detection method further includes:
and uploading the abnormal medical insurance data to a block chain.
In order to ensure the privacy and the security of the data, the medical insurance data with the abnormality can be uploaded to the block chain for storage.
In the method flow described in fig. 1, two value range ranges of the medical feature are determined by an interval estimation method and a random sampling method, and the two value range ranges are weighted according to the weighting parameter to obtain an estimated value range, which can be used for abnormality determination of the medical data. And when a medical examination receipt needing abnormality detection is received subsequently, the estimation value range can be adopted for judgment. In the invention, compared with any value range, the estimation value range is more accurate in abnormality judgment of the medical data and has better abnormality identification effect.
The above description is only a specific embodiment of the present invention, but the scope of the present invention is not limited thereto, and it will be apparent to those skilled in the art that modifications may be made without departing from the inventive concept of the present invention, and these modifications are within the scope of the present invention.
Referring to fig. 2, fig. 2 is a functional block diagram of a data anomaly detection apparatus according to a preferred embodiment of the present invention.
In some embodiments, the data anomaly detection apparatus operates in an electronic device. The data anomaly detection means may comprise a plurality of functional modules consisting of program code segments. Program code of various program segments in the data anomaly detection apparatus may be stored in a memory and executed by at least one processor to perform some or all of the steps of the data anomaly detection method described in fig. 1.
In this embodiment, the data anomaly detection apparatus may be divided into a plurality of functional modules according to the functions executed by the data anomaly detection apparatus. The functional module may include: the device comprises a receiving module 201, an extracting module 202, a first determining module 203, a second determining module 204, a third determining module 205, a judging module 206 and a fourth determining module 207. The module referred to herein is a series of computer program segments capable of being executed by at least one processor and capable of performing a fixed function and is stored in memory. In some embodiments, the functions of the modules will be described in detail in this embodiment.
A receiving module 201, configured to receive an input medical data sample.
An extraction module 202 for extracting a medical numerical feature from the medical data sample.
Wherein the medical numerical characteristic is a numerical type of medical data.
Specifically, the extracting the medical numerical characteristic from the medical data sample includes:
extracting from the medical data sample an original medical numerical-type feature comprised by the medical data sample; or
Determining a derived data sample from the medical data sample and extracting derived medical numerical features from the derived data sample.
Optionally, it may be determined whether the medical data sample includes a number, and if the medical data sample includes a number, extracting a medical numerical feature including a number, for example: the duration of the medicine for continuous treatment is 3 hours, and the original 3 hours can be extracted as medical numerical characteristics;
optionally, if the medical data sample does not contain a number, searching a related derivative data sample from the medical data sample, and extracting a medical numerical feature including a number from the derivative data sample, for example: a certain drug is adopted in a treatment scheme in a certain medical diagnostic book, but the drug adopts a default dosage, a derivative data sample of the default dosage of the drug can be found according to the medical diagnostic book, and the value of the default dosage is extracted as a derivative medical numerical characteristic.
The first determining module 203 is configured to determine an interval of the medical numerical feature by using an interval estimation method, and determine two endpoints of the interval as a first value range.
Specifically, the determining the interval of the medical numerical characteristic by using an interval estimation method includes:
calculating the mean value of the medical numerical type characteristics by adopting an interval estimation method;
calculating a sampling error of the medical numerical-type feature;
and determining the interval of the medical numerical characteristic according to the average value and the sampling error by adopting an interval estimation method.
Wherein, interval estimation (interval estimate) is based on point estimation, and gives an interval range of the overall parameter estimation, and the interval is usually obtained by adding and subtracting estimation error from sample statistic.
Wherein the interval may be expressed as (μ -Z α/2 σ, μ + Z α/2 σ), where μ is the mean, ± Z α/2 σ is the sampling error, α is the area covered by the non-confidence level within the normal distribution, Z α/2 is the corresponding standard score, and σ is the standard deviation. Where α is typically determined by the traffic and can be set to any number, such as 0.1, 0.5.
Wherein, through the statistics of historical data, the sampling error of 100 samples (for example, the injection amount of a certain medicine of a certain disease) can be determined to be +/-10%; the sampling error of 500 samples (for example, the injection amount of a certain medicine for a certain disease) is +/-5 percent; the sampling error is + -3% for 1200 or more samples (e.g., for the amount of a drug for a disease injected).
The second determining module 204 is configured to determine a sampling value range obtained by sampling the medical numerical feature each time by using a random sampling method, and determine a second value range according to the sampling value range each time.
Specifically, the determining, by using a random sampling method, a sampling value range obtained by sampling the medical numerical characteristic each time includes:
determining the total length of the medical numerical type features, numbering the medical numerical type features according to the total length, and obtaining a plurality of continuous numbers;
repeatedly sampling the plurality of continuous numbers by adopting a replaced random sampling method to obtain a plurality of groups of number sampling values, wherein the plurality of groups of number sampling values comprise the plurality of continuous numbers;
and determining a sampling value range of the medical numerical type characteristic corresponding to the number sampling value for each group of the number sampling values.
Wherein the total length of the medical numerical features (e.g. the total number of injection quantity features of 500 certain medicines) can be calculated and numbered with consecutive integers, such as 1-500 consecutive integers. The replaced random sampling method can ensure that the number of samples sampled each time is the number of original samples, and ensure the randomness of sampling. Each sampling can obtain a group of sampling number values, a column can be designed to record the sampling number values, and the sampling is repeated until all the numbers are acquired, namely, the finally obtained multiple groups of sampling number values comprise all the continuous numbers. Each time sampling is carried out, a group of medical numerical characteristics can be determined, and the maximum value and the minimum value in the group of medical numerical characteristics can be determined, namely, the maximum value and the minimum value can be determined as the sampling value range of the group.
Specifically, the repeatedly sampling the plurality of consecutive numbers by using the replaced random sampling method to obtain a plurality of groups of number sampling values includes:
sampling the plurality of continuous numbers by adopting a replaced random sampling method to obtain a first number sampling value;
judging whether the first number sampling value contains all the plurality of continuous numbers;
if the first serial number sampling value does not contain all the serial numbers, repeatedly sampling the serial numbers to obtain a second serial number sampling value;
judging whether the first number sampling value and the second number sampling value contain all the plurality of continuous numbers or not;
and if the first number sampling value and the second number sampling value contain all the plurality of continuous numbers, determining that the first number sampling value and the second number sampling value are a plurality of groups of number sampling values.
After a group of number sampling values are obtained by sampling each time, whether the sampled numbers contain all numbers needs to be judged, and if not, the sampling needs to be repeated until all the numbers are sampled. The "first" and "second" are merely examples, and represent repeated sampling, and are not limited to two times of sampling, and may be three times of sampling, four times of sampling, or even more times of sampling, and the embodiment of the present invention is not limited thereto.
A third determining module 205, configured to weight the first value range and the second value range, and determine a learned estimated value range.
Specifically, the weighting the first value range and the second value range, and determining the learned estimated value range includes:
acquiring a first error distribution corresponding to the first value range and acquiring a second error distribution corresponding to the second value range;
determining a first weight parameter of the first value range and a second weight parameter of the second value range according to the first error distribution and the second error distribution;
and weighting the first value range and the second value range according to the first weight parameter and the second weight parameter to obtain a learned estimated value range.
The weight parameters of different value range ranges can be set according to the error distribution condition, generally, the higher the error distribution is, the smaller the corresponding weight parameter is, and the lower the error distribution is, the larger the corresponding weight parameter is.
According to the method, the two value range ranges are determined through an interval estimation method and a random sampling method, the two value range ranges are weighted according to the weighting parameters, and the estimation value range is obtained, and compared with any one value range, the estimation value range is more accurate in abnormality judgment of medical data and better in effect.
The extraction module 202 is further configured to extract the numerical characteristic to be detected from the medical insurance data when the medical insurance data requiring the abnormality detection is received.
The numerical characteristic to be detected can be various medical characteristics including numbers, such as 5 hours of oxygen inhalation and 10ml of medicine dosage.
The determining module 206 is configured to determine whether the to-be-detected numerical characteristic exceeds the estimated value range.
A fourth determining module 207, configured to determine that the medical insurance data is abnormal if the to-be-detected numerical characteristic exceeds the estimated value range.
Wherein the features within the estimation value range belong to normal features, and the features outside the estimation value range belong to abnormal features. When the numerical characteristic to be detected exceeds the estimation value range, it may be determined that the medical insurance data is abnormal, such as: the oxygen inhalation time is more than 24 hours, and the dosage of certain medicine is beyond the normal range of the estimated value range. And when the numerical characteristic to be detected is judged to be in the estimation value range, the medical insurance data can be determined to be normal.
Optionally, the data anomaly detection apparatus further includes:
and the sending module is used for uploading the abnormal medical insurance data to the block chain.
In order to ensure the privacy and the security of the data, the medical insurance data with the abnormality can be uploaded to the block chain for storage.
In the data anomaly detection apparatus described in fig. 2, two value range ranges of the medical feature are determined by an interval estimation method and a random sampling method, and the two value range ranges are weighted according to the weighting parameters to obtain an estimated value range, which can be used for anomaly determination of the medical data. And when a medical examination receipt needing abnormality detection is received subsequently, the estimation value range can be adopted for judgment. In the invention, compared with any value range, the estimation value range is more accurate in abnormality judgment of the medical data and has better abnormality identification effect.
As shown in fig. 3, fig. 3 is a schematic structural diagram of an electronic device implementing a data anomaly detection method according to a preferred embodiment of the present invention. The electronic device 3 comprises a memory 31, at least one processor 32, a computer program 33 stored in the memory 31 and executable on the at least one processor 32, and at least one communication bus 34.
Those skilled in the art will appreciate that the schematic diagram shown in fig. 3 is merely an example of the electronic device 3, and does not constitute a limitation of the electronic device 3, and may include more or less components than those shown, or combine some components, or different components, for example, the electronic device 3 may further include an input/output device, a network access device, and the like.
The at least one Processor 32 may be a Central Processing Unit (CPU), other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic, discrete hardware components, etc. The processor 32 may be a microprocessor or the processor 32 may be any conventional processor or the like, and the processor 32 is a control center of the electronic device 3 and connects various parts of the whole electronic device 3 by various interfaces and lines.
The memory 31 may be used to store the computer program 33 and/or the module/unit, and the processor 32 may implement various functions of the electronic device 3 by running or executing the computer program and/or the module/unit stored in the memory 31 and calling data stored in the memory 31. The memory 31 may mainly include a program storage area and a data storage area, wherein the program storage area may store an operating system, an application program required by at least one function (such as a sound playing function, an image playing function, etc.), and the like; the storage data area may store data (such as audio data) created according to the use of the electronic device 3, and the like. Further, the memory 31 may include a non-volatile memory, such as a hard disk, a memory, a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), at least one magnetic disk storage device, a Flash memory device, or other non-volatile solid state storage device.
With reference to fig. 1, the memory 31 in the electronic device 3 stores a plurality of instructions to implement a data anomaly detection method, and the processor 32 can execute the plurality of instructions to implement:
receiving an input medical data sample;
extracting medical numerical features from the medical data sample;
determining an interval of the medical numerical type characteristic by adopting an interval estimation method, and determining two endpoints of the interval as a first value range;
determining a sampling value range obtained by sampling the medical numerical type characteristics each time by adopting a random sampling method, and determining a second value range according to the sampling value range each time;
weighting the first value range and the second value range to determine a learned estimated value range;
when medical insurance data needing abnormal detection is received, extracting numerical characteristics to be detected from the medical insurance data;
judging whether the numerical characteristic to be detected exceeds the estimation value range or not;
and if the numerical characteristic to be detected exceeds the estimation value range, determining that the medical insurance data is abnormal.
In an alternative embodiment, the extracting the medical numerical feature from the medical data sample comprises:
extracting from the medical data sample an original medical numerical-type feature comprised by the medical data sample; or
Determining a derived data sample from the medical data sample and extracting derived medical numerical features from the derived data sample.
In an alternative embodiment, the determining the interval of the medical numerical feature by using an interval estimation method includes:
calculating the mean value of the medical numerical type characteristics by adopting an interval estimation method;
calculating a sampling error of the medical numerical-type feature;
and determining the interval of the medical numerical characteristic according to the average value and the sampling error by adopting an interval estimation method.
In an alternative embodiment, the determining, by using a random sampling method, a sampling range obtained by sampling the medical numerical feature each time includes:
determining the total length of the medical numerical type features, numbering the medical numerical type features according to the total length, and obtaining a plurality of continuous numbers;
repeatedly sampling the plurality of continuous numbers by adopting a replaced random sampling method to obtain a plurality of groups of number sampling values, wherein the plurality of groups of number sampling values comprise the plurality of continuous numbers;
and determining a sampling value range of the medical numerical type characteristic corresponding to the number sampling value for each group of the number sampling values.
In an optional embodiment, the repeatedly sampling the plurality of consecutive numbers by using the replaced random sampling method to obtain a plurality of sets of number sampling values includes:
sampling the plurality of continuous numbers by adopting a replaced random sampling method to obtain a first number sampling value;
judging whether the first number sampling value contains all the plurality of continuous numbers;
if the first serial number sampling value does not contain all the serial numbers, repeatedly sampling the serial numbers to obtain a second serial number sampling value;
judging whether the first number sampling value and the second number sampling value contain all the plurality of continuous numbers or not;
and if the first number sampling value and the second number sampling value contain all the plurality of continuous numbers, determining that the first number sampling value and the second number sampling value are a plurality of groups of number sampling values.
In an optional implementation, the weighting the first value range and the second value range, and determining the learned estimated value range includes:
acquiring a first error distribution corresponding to the first value range and acquiring a second error distribution corresponding to the second value range;
determining a first weight parameter of the first value range and a second weight parameter of the second value range according to the first error distribution and the second error distribution;
and weighting the first value range and the second value range according to the first weight parameter and the second weight parameter to obtain a learned estimated value range.
In an alternative embodiment, the processor 32 may execute the plurality of instructions to implement:
and uploading the abnormal medical insurance data to a block chain.
Specifically, the processor 32 may refer to the description of the relevant steps in the embodiment corresponding to fig. 1 for a specific implementation method of the instruction, which is not described herein again.
In the electronic device 3 depicted in fig. 3, two value range ranges of the medical feature are determined by an interval estimation method and a random sampling method, and the two value range ranges are weighted according to the weighting parameter to obtain an estimated value range, which can be used for abnormality determination of the medical data. And when a medical examination receipt needing abnormality detection is received subsequently, the estimation value range can be adopted for judgment. In the invention, compared with any value range, the estimation value range is more accurate in abnormality judgment of the medical data and has better abnormality identification effect.
The integrated modules/units of the electronic device 3 may be stored in a computer-readable storage medium if they are implemented in the form of software functional units and sold or used as separate products. Based on such understanding, all or part of the flow of the method according to the embodiments of the present invention may also be implemented by a computer program, which may be stored in a computer-readable storage medium, and when the computer program is executed by a processor, the steps of the method embodiments may be implemented. Wherein the computer program comprises computer program code, which may be in the form of source code, object code, an executable file or some intermediate form, etc. The computer-readable medium may include: any entity or device capable of carrying the computer program code, recording medium, U-disk, removable hard disk, magnetic disk, optical disk, computer Memory, and Read-Only Memory (ROM).
In the embodiments provided in the present invention, it should be understood that the disclosed system, apparatus and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the modules is only one logical functional division, and other divisions may be realized in practice.
The modules described as separate parts may or may not be physically separate, and parts displayed as modules may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment.
In addition, functional modules in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, or in a form of hardware plus a software functional module.
It will be evident to those skilled in the art that the invention is not limited to the details of the foregoing illustrative embodiments, and that the present invention may be embodied in other specific forms without departing from the spirit or essential attributes thereof. The present embodiments are therefore to be considered in all respects as illustrative and not restrictive, the scope of the invention being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein. Any reference signs in the claims shall not be construed as limiting the claim concerned. The units or means recited in the system claims may also be implemented by software or hardware.
Finally, it should be noted that the above embodiments are only for illustrating the technical solutions of the present invention and not for limiting, and although the present invention is described in detail with reference to the preferred embodiments, it should be understood by those skilled in the art that modifications or equivalent substitutions may be made on the technical solutions of the present invention without departing from the spirit and scope of the technical solutions of the present invention.

Claims (10)

1. A data anomaly detection method is characterized by comprising the following steps:
receiving an input medical data sample;
extracting medical numerical features from the medical data sample;
determining an interval of the medical numerical type characteristic by adopting an interval estimation method, and determining two endpoints of the interval as a first value range;
determining a sampling value range obtained by sampling the medical numerical type characteristics each time by adopting a random sampling method, and determining a second value range according to the sampling value range each time;
weighting the first value range and the second value range to determine a learned estimated value range;
when medical insurance data needing abnormal detection is received, extracting numerical characteristics to be detected from the medical insurance data;
judging whether the numerical characteristic to be detected exceeds the estimation value range or not;
and if the numerical characteristic to be detected exceeds the estimation value range, determining that the medical insurance data is abnormal.
2. The data anomaly detection method according to claim 1, wherein said extracting a medical numerical feature from said medical data sample comprises:
extracting from the medical data sample an original medical numerical-type feature comprised by the medical data sample; or
Determining a derived data sample from the medical data sample and extracting derived medical numerical features from the derived data sample.
3. The data anomaly detection method according to claim 1, wherein said determining an interval of said medical numerical feature using an interval estimation method comprises:
calculating the mean value of the medical numerical type characteristics by adopting an interval estimation method;
calculating a sampling error of the medical numerical-type feature;
and determining the interval of the medical numerical characteristic according to the average value and the sampling error by adopting an interval estimation method.
4. The data anomaly detection method according to claim 1, wherein said determining, by using a stochastic sampling method, a sampling range obtained by sampling the medical numerical feature each time comprises:
determining the total length of the medical numerical type features, numbering the medical numerical type features according to the total length, and obtaining a plurality of continuous numbers;
repeatedly sampling the plurality of continuous numbers by adopting a replaced random sampling method to obtain a plurality of groups of number sampling values, wherein the plurality of groups of number sampling values comprise the plurality of continuous numbers;
and determining a sampling value range of the medical numerical type characteristic corresponding to the number sampling value for each group of the number sampling values.
5. The method according to claim 4, wherein the repeatedly sampling the plurality of consecutive numbers by using the replaced random sampling method to obtain a plurality of sets of number sampling values comprises:
sampling the plurality of continuous numbers by adopting a replaced random sampling method to obtain a first number sampling value;
judging whether the first number sampling value contains all the plurality of continuous numbers;
if the first serial number sampling value does not contain all the serial numbers, repeatedly sampling the serial numbers to obtain a second serial number sampling value;
judging whether the first number sampling value and the second number sampling value contain all the plurality of continuous numbers or not;
and if the first number sampling value and the second number sampling value contain all the plurality of continuous numbers, determining that the first number sampling value and the second number sampling value are a plurality of groups of number sampling values.
6. The data anomaly detection method of claim 1, wherein said weighting the first range of values and the second range of values and determining a learned estimated range of values comprises:
acquiring a first error distribution corresponding to the first value range and acquiring a second error distribution corresponding to the second value range;
determining a first weight parameter of the first value range and a second weight parameter of the second value range according to the first error distribution and the second error distribution;
and weighting the first value range and the second value range according to the first weight parameter and the second weight parameter to obtain a learned estimated value range.
7. The data anomaly detection method according to claim 1, further comprising:
and uploading the abnormal medical insurance data to a block chain.
8. A data abnormality detection device, characterized in that the data abnormality detection device comprises:
a receiving module for receiving an input medical data sample;
an extraction module for extracting medical numerical features from the medical data sample;
the first determination module is used for determining an interval of the medical numerical type characteristic by adopting an interval estimation method and determining two endpoints of the interval as a first value range;
the second determining module is used for determining a sampling value range obtained by sampling the medical numerical type feature each time by adopting a random sampling method, and determining a second value range according to the sampling value range each time;
a third determining module, configured to weight the first value range and the second value range, and determine a learned estimated value range;
the extraction module is further used for extracting numerical characteristics to be detected from the medical insurance data when the medical insurance data needing to be subjected to the abnormal detection is received;
the judging module is used for judging whether the numerical characteristic to be detected exceeds the range of the estimated value range;
and the fourth determining module is used for determining that the medical insurance data is abnormal if the numerical characteristic to be detected exceeds the estimation value range.
9. An electronic device, characterized in that the electronic device comprises a processor and a memory, the processor being configured to execute a computer program stored in the memory to implement the data anomaly detection method according to any one of claims 1 to 7.
10. A computer-readable storage medium storing at least one instruction which, when executed by a processor, implements a data anomaly detection method as claimed in any one of claims 1 to 7.
CN202010591658.XA 2020-06-24 2020-06-24 Data anomaly detection method and device, electronic equipment and storage medium Pending CN111739648A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010591658.XA CN111739648A (en) 2020-06-24 2020-06-24 Data anomaly detection method and device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010591658.XA CN111739648A (en) 2020-06-24 2020-06-24 Data anomaly detection method and device, electronic equipment and storage medium

Publications (1)

Publication Number Publication Date
CN111739648A true CN111739648A (en) 2020-10-02

Family

ID=72651100

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010591658.XA Pending CN111739648A (en) 2020-06-24 2020-06-24 Data anomaly detection method and device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN111739648A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112379859A (en) * 2020-11-13 2021-02-19 北京灵汐科技有限公司 Binary sampling processing method and device and countermeasure sample generating method and device
CN113449009A (en) * 2021-03-30 2021-09-28 广州朗国电子科技股份有限公司 Intelligent management method, equipment and medium for animal husbandry production management

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108784655A (en) * 2017-04-28 2018-11-13 西门子保健有限责任公司 Rapid evaluation for medical patient and consequences analysis
CN109635113A (en) * 2018-12-13 2019-04-16 平安医疗健康管理股份有限公司 Abnormal insured people purchases medicine data detection method, device, equipment and storage medium
CN110033019A (en) * 2019-03-06 2019-07-19 腾讯科技(深圳)有限公司 Method for detecting abnormality, device and the storage medium of human body
CN110765414A (en) * 2019-09-11 2020-02-07 深信服科技股份有限公司 Performance index data evaluation method, device, equipment and storage medium
CN111261298A (en) * 2019-12-25 2020-06-09 南京医康科技有限公司 Medical data quality pre-judging method and device, readable medium and electronic equipment
CN111261289A (en) * 2018-11-30 2020-06-09 上海图灵医疗科技有限公司 Heart disease detection method based on artificial intelligence model

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108784655A (en) * 2017-04-28 2018-11-13 西门子保健有限责任公司 Rapid evaluation for medical patient and consequences analysis
CN111261289A (en) * 2018-11-30 2020-06-09 上海图灵医疗科技有限公司 Heart disease detection method based on artificial intelligence model
CN109635113A (en) * 2018-12-13 2019-04-16 平安医疗健康管理股份有限公司 Abnormal insured people purchases medicine data detection method, device, equipment and storage medium
CN110033019A (en) * 2019-03-06 2019-07-19 腾讯科技(深圳)有限公司 Method for detecting abnormality, device and the storage medium of human body
CN110765414A (en) * 2019-09-11 2020-02-07 深信服科技股份有限公司 Performance index data evaluation method, device, equipment and storage medium
CN111261298A (en) * 2019-12-25 2020-06-09 南京医康科技有限公司 Medical data quality pre-judging method and device, readable medium and electronic equipment

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112379859A (en) * 2020-11-13 2021-02-19 北京灵汐科技有限公司 Binary sampling processing method and device and countermeasure sample generating method and device
CN112379859B (en) * 2020-11-13 2023-08-18 北京灵汐科技有限公司 Binary sampling processing method and device and countermeasure sample generation method and device
CN113449009A (en) * 2021-03-30 2021-09-28 广州朗国电子科技股份有限公司 Intelligent management method, equipment and medium for animal husbandry production management

Similar Documents

Publication Publication Date Title
JP6878450B2 (en) Methods and devices to prevent advertising fraud and storage media
CN108876636B (en) Intelligent air control method, system, computer equipment and storage medium for claim settlement
CN109636623A (en) Medical data method for detecting abnormality, device, equipment and storage medium
CN105354721B (en) Method and device for identifying machine operation behavior
CN106302534B (en) A kind of method and system of detection and processing illegal user
CN111739648A (en) Data anomaly detection method and device, electronic equipment and storage medium
CN107222511B (en) Malicious software detection method and device, computer device and readable storage medium
CN109523399A (en) A kind of medical data processing method, device, equipment and storage medium
CN111818066B (en) Risk detection method and device
CN110825818A (en) Multi-dimensional feature construction method and device, electronic equipment and storage medium
CN108197795A (en) The account recognition methods of malice group, device, terminal and storage medium
CN106161389B (en) Cheating identification method and device and terminal
CN110807117A (en) User relationship prediction method and device and computer readable storage medium
CN108197885A (en) The Work attendance method and device of employee
CN110876072B (en) Batch registered user identification method, storage medium, electronic device and system
CN112437034A (en) False terminal detection method and device, storage medium and electronic device
CN113657550A (en) Patient marking method, device, equipment and storage medium based on hierarchical calculation
CN109062638B (en) System component display method, computer readable storage medium and terminal device
CN111128233A (en) Recording detection method and device, electronic equipment and storage medium
CA2945450A1 (en) Patient medication adherence and intervention using trajectory patterns
CN116402596A (en) Data analysis method, device, computer equipment and readable storage medium
CN110717653A (en) Risk identification method and device and electronic equipment
CN109830290A (en) Method for managing medical information, device, server and readable storage medium storing program for executing
CN114529994A (en) Information processing method and related device
CN108629610B (en) Method and device for determining popularization information exposure

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right

Effective date of registration: 20220523

Address after: 518000 China Aviation Center 2901, No. 1018, Huafu Road, Huahang community, Huaqiang North Street, Futian District, Shenzhen, Guangdong Province

Applicant after: Shenzhen Ping An medical and Health Technology Service Co.,Ltd.

Address before: Room 12G, Area H, 666 Beijing East Road, Huangpu District, Shanghai 200001

Applicant before: PING AN MEDICAL AND HEALTHCARE MANAGEMENT Co.,Ltd.

TA01 Transfer of patent application right