WO2021114831A1 - 数据异常检测方法及装置 - Google Patents

数据异常检测方法及装置 Download PDF

Info

Publication number
WO2021114831A1
WO2021114831A1 PCT/CN2020/118419 CN2020118419W WO2021114831A1 WO 2021114831 A1 WO2021114831 A1 WO 2021114831A1 CN 2020118419 W CN2020118419 W CN 2020118419W WO 2021114831 A1 WO2021114831 A1 WO 2021114831A1
Authority
WO
WIPO (PCT)
Prior art keywords
test
clinical
data
knowledge
multiple sets
Prior art date
Application number
PCT/CN2020/118419
Other languages
English (en)
French (fr)
Inventor
李彦轩
廖希洋
孙行智
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2021114831A1 publication Critical patent/WO2021114831A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H10/00ICT specially adapted for the handling or processing of patient-related medical or healthcare data
    • G16H10/20ICT specially adapted for the handling or processing of patient-related medical or healthcare data for electronic clinical trials or questionnaires
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/20ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for computer-aided diagnosis, e.g. based on medical expert systems
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/30ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for calculating health indices; for individual health risk assessment

Definitions

  • This application relates to the field of data processing technology, and in particular to a data abnormality detection method and device.
  • Laboratory inspection is an important process in the clinic. It mainly uses various tools in the laboratory to evaluate the patient's health status and physiological function, and assist in diagnosis and treatment in clinical medicine.
  • the inventor found that under normal circumstances, the inspection is given by the doctor based on the patient’s main complaint and based on his own clinical experience. Therefore, the entire inspection and inspection process is highly subjective, and due to the differences in the clinical experience of the doctor, the inspection and inspection The results are not the same, and it is easy to lead to missed inspections or multiple inspections. Missed inspections lead to the lack of key clinical indicators, and multiple inspections lead to a long inspection and inspection process. Therefore, there is still a lack of inspections for doctors.
  • the method of anomaly detection results in low accuracy of inspection and low user experience.
  • the embodiments of this application provide a data abnormality detection method and device, a server, and a computer-readable storage medium. Conducive to improving the accuracy of data anomaly detection.
  • the first aspect of the embodiments of the present application provides a data abnormality detection method applied to a server, including: receiving a clinical data set and a test knowledge set, generating a clinical test data set based on the clinical data set and the test knowledge set;
  • the clinical test data set is used as the training data of the preset similarity ranking model to perform the training operation to obtain the trained target similarity ranking model; receiving the data to be tested, and inputting the data to be tested into the trained target similarity ranking
  • the model obtains multiple inspection knowledge, and determines whether the clinical inspection conclusion corresponding to the data to be inspected is in an abnormal state according to the multiple inspection knowledge.
  • the second aspect of the embodiments of the present application provides a data abnormality detection device, which is applied to a server.
  • the device includes a receiving unit, a training unit, and a judging unit.
  • the receiving unit is used to receive clinical data sets and test knowledge.
  • the set is to generate a clinical test data set based on the clinical data set and the test knowledge set;
  • the training unit is used to perform a training operation on the clinical test data set as the training data of the preset similarity ranking model to obtain a trained Target similarity ranking model;
  • the judgment unit is used to receive the data to be inspected, and input the data to be inspected into the trained target similarity ranking model to obtain multiple inspection knowledge, based on the multiple inspection knowledge It is determined whether the clinical test conclusion corresponding to the data to be tested is in an abnormal state.
  • a third aspect of the embodiments of the present application provides a server.
  • the server includes a processor, an input device, an output device, and a memory.
  • the processor, input device, output device, and memory are connected to each other, wherein the memory is used for A computer program is stored, the computer program includes program instructions, and the processor is configured to call the program instructions, wherein: a clinical data set and a test knowledge set are received, and a clinical test is generated based on the clinical data set and the test knowledge set A data set; the clinical test data set is used as the training data of the preset similarity ranking model to perform a training operation to obtain a trained target similarity ranking model; the data to be tested is received, and the data to be tested is input to the training A good target similarity ranking model obtains multiple inspection knowledge, and judges whether the clinical inspection conclusion corresponding to the data to be inspected is in an abnormal state according to the multiple inspection knowledge.
  • a fourth aspect of the embodiments of the present application provides a computer-readable storage medium, wherein the above-mentioned computer-readable storage medium stores a computer program for electronic data exchange, wherein the above-mentioned computer program causes the computer to perform the following steps: receiving clinical data The clinical test data set is generated according to the clinical data set and test knowledge set; the clinical test data set is used as the training data of the preset similarity ranking model to perform the training operation to obtain the trained target similarity Sorting model; receiving data to be tested, inputting the data to be tested into the trained target similarity ranking model to obtain multiple test knowledge, and judging the clinical test corresponding to the data to be tested based on the multiple test knowledge The conclusion is whether it is in an abnormal state.
  • the fifth aspect of the embodiments of the present application provides a computer program product, wherein the above-mentioned computer program product includes a non-transitory computer-readable storage medium storing a computer program, and the above-mentioned computer program is operable to cause a computer to execute as implemented in this application.
  • the computer program product may be a software installation package.
  • the implementation of the embodiments of this application has at least the following beneficial effects: through the embodiments of this application, multiple inspection knowledge corresponding to the data to be inspected can be obtained based on the trained target similarity ranking model, and the multiple inspection knowledges can be compared with the aforementioned data to be inspected. Corresponding clinical test conclusions are compared to determine whether the clinical test conclusions are abnormal, which is beneficial to improve the efficiency of data abnormality detection.
  • FIG. 1A is a schematic flowchart of a method for detecting data anomaly according to an embodiment of this application.
  • FIG. 1B provides a schematic diagram of a network structure of a preset similarity ranking model according to an embodiment of this application.
  • FIG. 1C is a schematic diagram of a network flow of a method for detecting data anomaly according to an embodiment of this application.
  • FIG. 2 is a schematic flowchart of a method for detecting data anomaly according to an embodiment of the application.
  • FIG. 3 is a schematic flowchart of a method for detecting data anomaly according to an embodiment of the present application.
  • FIG. 4 is a schematic structural diagram of a server provided in an embodiment of the application.
  • FIG. 5 is a schematic structural diagram of a data abnormality detection device provided in an embodiment of the application.
  • the servers mentioned in the embodiments of this application may include, but are not limited to, backend servers, component servers, cloud servers, data distribution system servers, or data distribution software servers, etc.
  • backend servers may include, but are not limited to, backend servers, component servers, cloud servers, data distribution system servers, or data distribution software servers, etc.
  • component servers may include, but are not limited to, backend servers, component servers, cloud servers, data distribution system servers, or data distribution software servers, etc.
  • the above are only examples and not exhaustive, including but not limited to the above devices .
  • FIG. 1A is a schematic flowchart of a data anomaly detection method provided by an embodiment of the present application, which is applied to a server.
  • the above method includes the following steps 101-103.
  • the above-mentioned clinical data set may include multiple sets of clinical data
  • the above-mentioned test knowledge set may include multiple sets of test knowledge
  • the above-mentioned clinical test data set may include multiple clinical test data, any of the foregoing multiple clinical test data
  • One clinical test data may include one clinical data and one test knowledge.
  • the above-mentioned clinical data set can be obtained based on historical clinical data, and can include the patient's clinical symptoms and the actual test results of the doctor, for example, the patient's disease, blood routine data, urine routine data, electrocardiogram, etc. Etc., which are not limited here; the above-mentioned test knowledge can include various symptoms and their corresponding common tests, for example, methods for diagnosis of acute respiratory infections viruses and bacteria, guidance and recommendations, etc., preparation and staining of blood smears in blood tests , Or the amount of extraction and other inspection standards, etc.
  • the above-mentioned clinical data set and clinical test knowledge set can be stored in a node of the blockchain.
  • the block referred to in the embodiment of the present application Chain is a new application mode of computer technology such as distributed data storage, point-to-point transmission, consensus mechanism, and encryption algorithm.
  • Blockchain essentially a decentralized database, is a series of data blocks associated with cryptographic methods. Each data block contains a batch of network transaction information for verification. The validity of the information (anti-counterfeiting) and the generation of the next block.
  • the blockchain can include the underlying platform of the blockchain, the platform product service layer, and the application service layer.
  • a clinical test data set is generated based on the clinical data set and test knowledge set, including 11-13.
  • any set of clinical data in the multiple sets of clinical data includes: at least one clinical symptom and a clinical test conclusion corresponding to the at least one clinical symptom.
  • any group of test knowledge in the multiple sets of test knowledge includes: one disease and at least one symptom test knowledge corresponding to the one disease.
  • the clinical test data set is generated based on the multiple sets of clinical data and the multiple sets of test knowledge.
  • the above-mentioned symptoms may include at least one of the following: fever, chills, cough, runny nose, throat swelling, etc., which are not limited here.
  • the above-mentioned clinical test data set generated from the clinical data set and test knowledge set can be used as a training set for subsequent model training operations, which is conducive to the advancement of subsequent judgment operations on clinical test conclusions.
  • the above step 13 to generate the clinical test data set according to the multiple sets of clinical data and the multiple sets of test knowledge may include the following steps 131-132.
  • Jackard similarity is an index used to judge the similarity between two data sets of test knowledge and clinical data.
  • the above-mentioned Multiple sets of clinical data and multiple clinical test data are evaluated to calculate multiple Jaccard similarities, and the above multiple sets of test knowledge and multiple sets of clinical data are screened through Jaccard similarity to obtain subsequent models for use Training clinical test data set.
  • the Jaccard similarity between any group of clinical data in multiple groups of clinical data and each group of test knowledge can be calculated to obtain multiple Jaccard similarities, and multiple Jaccard similarities are selected.
  • the maximum value corresponding to the test knowledge is the target test knowledge; repeat the above steps to obtain multiple clinical test data corresponding to multiple sets of clinical data, and the multiple clinical test data above constitute the clinical test data set, so that it can be used for subsequent follow-up The clinical test data set for model training.
  • the foregoing step 131 calculating the multiple Jaccard similarities corresponding to the target clinical data and the multiple sets of test knowledge, may include the following steps:
  • the above-mentioned preset Jaccard calculation formula can be set by the user or the system defaults, and it is not limited here.
  • the aforementioned preset similarity ranking model can be set by the user or the system defaults, which is not limited here; for example, the aforementioned preset similarity ranking model can be a Deep Neural Network (DNN).
  • DNN Deep Neural Network
  • the above-mentioned clinical test data set can be used as training data for the preset similarity ranking model training, so that the trained target similarity ranking model may be most similar to the input in the general direction.
  • Optimizing the items in is conducive to the requirements for optimizing multiple test knowledge corresponding to multiple most similar clinical data in the embodiments of the present application.
  • step 102 the clinical test data set is used as the training data of the preset similarity ranking model to perform a training operation to obtain a trained target similarity ranking model, which may include the following steps 21-25 .
  • FIG. 1B is a schematic diagram of a network structure of a preset similarity ranking model provided by an embodiment of the application.
  • the foregoing preset similarity ranking model may include a DNN network model, and the foregoing preset similarity ranking model It may also include a first attention layer and a second attention layer.
  • the force layer contains the first vector with the same dimensions as the multiple sets of clinical data
  • the first attention layer adjusts the value of the vector element in the first vector through model training.
  • the value of any vector element in the first vector represents DNN-based The importance of the similarity ranking model to any vector element. The higher the value of any vector element, the greater the importance of the DNN similarity ranking model to any vector element, and vice versa.
  • multiple sets of test knowledge can be used as the input of the second attention layer of the preset similarity ranking model.
  • the second attention layer contains and Groups test the second vector with the same knowledge dimension, and the second attention layer adjusts the value of the vector element in the second vector through model training, and obtains the clinical feature proportion vector of multiple sets of clinical data from the first attention layer, from the first attention layer
  • the second attention layer acquires the knowledge feature weighting vectors of multiple sets of test knowledge, and inputs the clinical feature weighting vectors and knowledge feature weighting vectors to the above-mentioned DNN network respectively, and performs model training operations to obtain multiple sets of clinical data and multiple sets of test knowledge embedding vectors , Where the embedding vector represents the similarity between multiple sets of clinical data and multiple sets of test knowledge.
  • the preset similarity ranking model can be updated based on the embedding vector, and the final trained target similarity ranking model can be obtained.
  • the embedding vector cosine loss function can be obtained, and the embedding vector cosine loss function can be constrained
  • the distribution of the embedding vector makes the cosine similarity of the embedded vector corresponding to the clinical data and the test knowledge in the multiple sets of clinical data and the test knowledge greater, and makes the cosines of the embedding vector corresponding to the unmatched clinical data and the test knowledge similar
  • the degree is smaller; you can get the preset TopkRankLoss loss function, embed the preset TopkRankLoss loss function, update the preset similarity ranking model, and get the updated similarity ranking model.
  • the above embedding vector cosine loss function can be set by the user or the system defaults, which is not limited here; the above preset TopkRankLoss loss function can be set by the user or the system defaults, and is not limited here.
  • the preset TopkRankLoss loss function can be include:
  • x i is any item in the similarity list calculated by the above-trained target similarity ranking model for any clinical test data in the clinical test data set
  • maxK is the total length of the similarity list
  • positive, simi_positive, and negative indicate that the clinical data in the clinical test data matches, approximate matches, and does not match the test knowledge.
  • the test knowledge when the clinical data matches the test knowledge, the test knowledge is in the first place in the similarity list, and the resulting loss is 0, which does not need to be corrected; when the clinical data and the test knowledge are in an approximate matching relationship or a mismatch relationship
  • k is an integer greater than 1.
  • the above algorithm introduces the preset TopkRankLoss loss function in response to the requirements of the clinical inspection inspection quality control scenario for several inspection items that are most similar to the patient; during the training process, The vector cosine loss function is introduced, and the embedding vector cosine loss function and the preset TopkRankLoss loss function are used for alternate training mechanism at the same time, so that the trained target similarity ranking model can not only match the most similar items to the input in the general direction
  • the optimization can also meet the requirements for the most similar inspection items in the scenario of the embodiment of the present application, and solve the limitation problem that the existing method can only optimize the most similar items.
  • the embodiment of the application introduces an attention mechanism into the preset similarity ranking model, so that the model can learn the importance of each feature in clinical data and test knowledge during the training process, thereby providing explanatory properties for the final clinical test results , Solve the problem of poor interpretability of neural network model.
  • Receive the data to be inspected input the data to be inspected into the trained target similarity ranking model to obtain multiple inspection knowledge, and judge the clinical inspection conclusion corresponding to the data to be inspected based on the multiple inspection knowledge Whether it is in an abnormal state.
  • the aforementioned data to be tested may include clinical data to be tested, and the clinical data may include disease, blood routine data, urine routine data, electrocardiogram, etc., which are not limited herein.
  • the data to be inspected can be input into the trained target similarity ranking model to generate a similarity list and obtain multiple inspection knowledge.
  • the server can further determine its similarity based on the multiple inspection knowledge. According to the position in the list, it is determined whether the clinical test conclusion corresponding to the above-mentioned data to be tested is in an abnormal state, that is, the quality inspection of the clinical test conclusion can be performed to determine the accuracy of the clinical test conclusion.
  • the embodiment of this application when it is difficult for a doctor to provide a complete and accurate examination result when facing a complicated condition, the embodiment of this application can provide quality protection for him, thereby reducing his workload and improving his work. effectiveness.
  • the foregoing step 103 inputting the data to be inspected into the trained target similarity ranking model to obtain multiple inspection knowledge, may include the following steps 311-313.
  • the server inputs the data to be checked into the target similarity ranking model to obtain a similarity matrix corresponding to the data to be checked.
  • the similarity matrix includes the data to be checked and multiple tests in the target similarity ranking model. The Jackard similarity of the data.
  • the similarities contained in the similarity matrix can be sorted according to certain rules, for example, the similarities included in the similarity matrix can be sorted from large to small to generate a sorted target similarity matrix; Since the greater the similarity, the higher the matching.
  • the first k similarities in the target similarity matrix can be determined, and the k test knowledge corresponding to the k similarities can be determined, namely
  • the inspection knowledge corresponding to the first 5 similarities can be selected as the criterion for judging whether the following clinical inspection conclusion is in an abnormal state, where k is an integer greater than 1.
  • the foregoing step 103 judging whether the clinical test conclusion corresponding to the data to be tested is in an abnormal state based on the multiple test knowledge, may include the following steps 321-323.
  • the clinical test conclusion includes the k test knowledge, determine whether the clinical test conclusion is consistent with the k test knowledge; if they are consistent, determine that the clinical test conclusion is in a non-abnormal state; if it is inconsistent, determine The clinical test conclusion is in an abnormal state.
  • the server may determine whether the clinical test conclusion is in an abnormal state based on multiple test knowledge and the clinical test conclusion corresponding to the above-mentioned data to be checked, and the abnormal state may include at least one of the following: a multi-check state, a missed check state, and so on; Specifically, when the above-mentioned clinical test conclusion includes k test knowledge, and the above-mentioned clinical test conclusion is inconsistent with the k test knowledge, it can be determined that the clinical test conclusion is in the abnormal state of the multi-test state; if the above-mentioned clinical test If the conclusion does not include any of the k test knowledge, it is determined that the clinical test conclusion is in an abnormal state of missed inspection; if the above clinical conclusion includes k test knowledge and is consistent with k test knowledge, then the clinical test knowledge is determined The inspection conclusion is a non-abnormal state.
  • the server can input multiple sets of clinical data and multiple sets of test data into the first attention layer and the second attention layer of the preset similarity ranking model. , And respectively output the clinical feature weighting vectors of multiple sets of clinical data and the knowledge feature weighting vectors of multiple sets of test knowledge, and input the two into the DNN network model to obtain multiple sets of clinical data and multiple sets of test knowledge corresponding embedding vectors,
  • the embedding vector represents the similarity between the multiple sets of clinical data and the multiple sets of test knowledge; further, the embedding vector cosine loss function can be obtained, and the distribution of the embedding vector can be restricted by the embedding vector cosine loss function, and the preset can be obtained
  • the TopkRankLoss loss function; and then the preset similarity ranking model can be updated based on the preset TopkRankLoss loss function, and the updated model can be trained based on the embedding vector to obtain the trained target similar
  • the server can input the data to be inspected into the target similarity ranking model, and obtain a similarity list, and based on the similarity list, determine multiple inspection knowledge corresponding to the data to be inspected, and finally by combining the multiple inspection knowledge with The clinical test results corresponding to the data to be tested are compared to determine whether the clinical test results are in an abnormal state; this helps to improve the efficiency of clinical data testing.
  • the data abnormality detection method described in the embodiments of this application is applied to a server, and can receive clinical data sets and test knowledge sets, generate clinical test data sets based on the clinical data sets and test knowledge sets, and combine the clinical test data sets Perform the training operation as the training data of the preset similarity ranking model to obtain the trained target similarity ranking model; receive the data to be inspected, input the data to be inspected into the trained target similarity ranking model, and obtain multiple inspection knowledge, Based on multiple inspection knowledge, it is judged whether the clinical inspection conclusion corresponding to the data to be inspected is in an abnormal state; in this way, multiple inspection knowledge corresponding to the data to be inspected can be obtained based on the trained target similarity ranking model, and the multiple inspection knowledge can be compared with the above
  • the clinical test conclusions corresponding to the data to be tested are compared to determine whether the clinical test conclusions are abnormal, which is beneficial to improve the efficiency of data abnormality detection.
  • FIG. 2 is an exemplary flow chart of a data anomaly detection method disclosed in an embodiment of the present application, which is applied to a server.
  • the data anomaly detection method may include the following steps 201-209.
  • a clinical data set and a test knowledge set are received, and multiple sets of clinical data are obtained from the clinical data set, wherein any set of clinical data in the multiple sets of clinical data includes: at least one clinical symptom and the at least one clinical symptom Corresponding clinical test conclusions.
  • any group of test knowledge in the multiple sets of test knowledge includes: one disease and at least one symptom test knowledge corresponding to the one disease.
  • Receive the data to be inspected input the data to be inspected into the trained target similarity ranking model to obtain multiple inspection knowledge, and determine the clinical inspection conclusion corresponding to the data to be inspected based on the multiple inspection knowledge Whether it is in an abnormal state.
  • the data abnormality detection method described in the embodiment of this application is applied to a server, and can receive clinical data sets and test knowledge sets, and obtain multiple sets of clinical data from the clinical data sets, where any of the multiple sets of clinical data
  • a set of clinical data includes: at least one clinical symptom and at least one clinical test conclusion corresponding to at least one clinical symptom; multiple sets of test knowledge are obtained from the test knowledge set, where any set of test knowledge in the multiple sets of test knowledge includes: one disease and one disease Corresponding at least one symptom test knowledge; generate a clinical test data set based on multiple sets of clinical data and multiple sets of test knowledge; use multiple sets of clinical data as the input of the first attention layer of the preset similarity ranking model; combine multiple sets of test knowledge As the input of the second attention layer of the preset similarity ranking model; obtain the clinical feature weight vector of multiple sets of clinical data from the first attention layer, and obtain the knowledge feature weight vector of multiple sets of test knowledge from the second attention layer; Input the clinical feature weighting vector and knowledge feature weighting vector into the preset
  • FIG. 3 is an example flowchart of a data anomaly detection method disclosed in an embodiment of the present application, which is applied to a server.
  • the data anomaly detection method may include the following steps 301-306.
  • the clinical test conclusion includes the k test knowledge, determine whether the clinical test conclusion is consistent with the k test knowledge; if they are consistent, determine that the clinical test conclusion is in a non-abnormal state; if it is inconsistent, determine The clinical test conclusion is in an abnormal state.
  • the data abnormality detection method described in the embodiments of the application is applied to a server, and can receive clinical data sets and test knowledge sets, and generate clinical test data sets based on the clinical data sets and test knowledge sets; take the clinical test data sets as Perform the training operation on the training data of the preset similarity ranking model to obtain the trained target similarity ranking model; receive the data to be inspected, and input the data to be inspected into the trained target similarity ranking model to obtain multiple inspection knowledge, more A test knowledge includes k test knowledge, where k test knowledge corresponds to the data to be tested, and k is an integer greater than 1.
  • the clinical test conclusion corresponding to the test data, and judge whether the clinical test conclusion contains k test knowledge; if The clinical test conclusion contains k test knowledge, judge whether the clinical test conclusion is consistent with the k test knowledge; if they are consistent, determine that the clinical test conclusion is in a non-abnormal state; if they are inconsistent, determine that the clinical test conclusion is in an abnormal state; if the clinical test conclusion does not include Any one of the k test knowledges determines that the clinical test conclusion is in an abnormal state; in this way, the top k test knowledges corresponding to the data to be tested can be determined based on the trained target similarity ranking model, and pass multiple The inspection knowledge is compared with the clinical inspection conclusion corresponding to the above-mentioned data to be inspected to determine whether the clinical inspection conclusion is an abnormal state, which is beneficial to improve the efficiency of data abnormality detection.
  • FIG. 4 is a schematic structural diagram of a server provided by an embodiment of the application. As shown in FIG. 4, it includes a processor, a communication interface, a memory, and one or more programs.
  • a device, a communication interface, and a memory are connected to each other, wherein the memory is used to store a computer program, the computer program includes program instructions, the processor is configured to call the program instructions, and the one or more program programs include Instructions for performing the following steps: receiving a clinical data set and a test knowledge set, generating a clinical test data set based on the clinical data set and test knowledge set; using the clinical test data set as training data of a preset similarity ranking model Perform the training operation to obtain a trained target similarity ranking model; receive the data to be inspected, and input the data to be inspected into the trained target similarity ranking model to obtain multiple inspection knowledge, according to the multiple inspections Knowledge determines whether the clinical test conclusion corresponding to the data to be tested is in an abnormal state.
  • the server described in the embodiment of this application can receive clinical data sets and test knowledge sets, generate clinical test data sets based on the clinical data sets and test knowledge sets, and use the clinical test data sets as the preset similarity ranking model Perform the training operation on the training data to obtain the trained target similarity ranking model; receive the data to be inspected, and input the data to be inspected into the trained target similarity ranking model to obtain multiple inspection knowledge, and judge the candidate according to the multiple inspection knowledge.
  • the clinical test conclusion corresponding to the test data is in an abnormal state; in this way, multiple test knowledge corresponding to the test data can be obtained based on the trained target similarity ranking model, and the clinical test corresponding to the above test data can be passed through the multiple test knowledge The conclusion is compared to determine whether the clinical test conclusion is abnormal, which is beneficial to improve the efficiency of data abnormality detection.
  • the program in terms of generating a clinical test data set based on the clinical data set and test knowledge set, the program is used to execute instructions of the following steps: acquiring multiple sets of clinical data from the clinical data set, Wherein, any group of clinical data in the multiple sets of clinical data includes: at least one clinical symptom and a clinical test conclusion corresponding to the at least one clinical symptom; multiple sets of test knowledge are acquired from the test knowledge set, wherein the multiple Any group of test knowledge in the group test knowledge includes: one disease and at least one symptom test knowledge corresponding to the one disease; the clinical test data set is generated based on the multiple groups of clinical data and the multiple groups of test knowledge.
  • the program is used to execute instructions of the following steps: Obtain any set of clinical data as target clinical data from the data, calculate multiple Jaccard similarities corresponding to the target clinical data and the multiple sets of test knowledge; obtain the maximum value of the multiple Jaccard similarities
  • the corresponding test knowledge is the target test knowledge, the mapping relationship between the target clinical data and the target test knowledge is generated as the clinical test data, the above steps are repeated to obtain the multiple clinical test data corresponding to the multiple sets of clinical data, and all the clinical test data are merged.
  • the multiple clinical test data obtain the clinical test data set.
  • the program in terms of calculating the multiple Jaccard similarities corresponding to the target clinical data and the multiple sets of test knowledge, the program is used to execute instructions of the following steps: Obtain a preset Jaccard German calculation formula, obtain any one of the multiple sets of test knowledge, and use the target clinical data and the any set of test knowledge as the input of the preset Jaccard calculation formula to obtain the target clinical The data and the Jaccard similarity of any set of test knowledge are repeated, and multiple Jaccard similarities are obtained by repeating the above steps.
  • the program in terms of performing a training operation using the clinical test data set as the training data of a preset similarity ranking model to obtain a trained target similarity ranking model, the program is used to perform the following steps Instructions: use the multiple sets of clinical data as the input of the first attention layer of the preset similarity ranking model; use the multiple sets of inspection knowledge as the second attention layer of the preset similarity ranking model Obtain the clinical feature weighting vectors of the multiple sets of clinical data from the first attention layer, and acquire the knowledge feature weighting vectors of the multiple sets of test knowledge from the second attention layer; combine the clinical features The proportion vector and the knowledge feature proportion vector are input into the preset similarity ranking model to obtain the embedding vectors corresponding to the multiple sets of clinical data and the multiple sets of test knowledge, wherein the embedding vector represents the multiple sets of clinical data The similarity between the data and the multiple sets of test knowledge; training the preset similarity ranking model based on the embedding vector to obtain the trained target similarity ranking model.
  • the program is used to execute the instructions of the following steps:
  • the trained target similarity ranking model is calculated to obtain the similarity matrix of the data to be inspected; the similarities contained in the similarity matrix are sorted according to the rule from large to small to obtain the sorted target similarity Matrix; determine the first k similarities of the target similarity matrix to obtain k test knowledges corresponding to the k similarities, and the multiple test knowledges include the k test knowledges, where the k
  • the inspection knowledge corresponds to the data to be inspected, and k is an integer greater than 1.
  • the program is used to execute the instructions of the following steps: obtain the test According to the clinical test conclusion corresponding to the data, determine whether the clinical test conclusion includes the k test knowledge; if the clinical test conclusion includes the k test knowledge, determine whether the clinical test conclusion and the k test knowledge Consistent; if they are consistent, determine that the clinical test conclusion is in a non-abnormal state; if they are inconsistent, determine that the clinical test conclusion is in an abnormal state; if the clinical test conclusion does not include any of the k test knowledge, determine The clinical test conclusion is in an abnormal state.
  • the server includes hardware structures and/or software modules corresponding to each function.
  • this application can be implemented in the form of hardware or a combination of hardware and computer software. Whether a function is executed by hardware or computer software-driven hardware depends on the specific application and design constraints of the technical solution. Professionals and technicians can use different methods for each specific application to implement the described functions, but such implementation should not be considered beyond the scope of this application.
  • the embodiment of the present application may divide the server into functional units according to the foregoing method examples.
  • each functional unit may be divided corresponding to each function, or two or more functions may be integrated into one processing unit.
  • the above-mentioned integrated unit can be implemented in the form of hardware or software functional unit. It should be noted that the division of units in the embodiments of the present application is illustrative, and is only a logical function division, and there may be other division methods in actual implementation.
  • FIG. 5 is a schematic structural diagram of a data anomaly detection device disclosed in an embodiment of the present application, which is applied to a server.
  • the device includes a receiving unit 501, a training unit 502, and a judging unit 503.
  • the receiving unit 501 is configured to receive a clinical data set and a test knowledge set, and generate a clinical test data set according to the clinical data set and the test knowledge set;
  • the training unit 502 is used to use the clinical test data set as Perform a training operation on the training data of the preset similarity ranking model to obtain a trained target similarity ranking model;
  • the judgment unit 503 is configured to receive the data to be checked, and input the data to be checked into the trained target
  • the similarity ranking model obtains a plurality of inspection knowledge, and judges whether the clinical inspection conclusion corresponding to the data to be inspected is in an abnormal state according to the multiple inspection knowledge.
  • the data abnormality detection device described in the embodiment of this application is applied to a server.
  • the device can receive a clinical data set and a test knowledge set, and generate a clinical test data set based on the clinical data set and the test knowledge set.
  • the data set is used as the training data of the preset similarity ranking model to perform training operations to obtain the trained target similarity ranking model; receive the data to be checked, and input the data to be checked into the trained target similarity ranking model to obtain multiple tests Knowledge, based on multiple inspection knowledge to determine whether the clinical inspection conclusion corresponding to the data to be inspected is in an abnormal state; in this way, multiple inspection knowledge corresponding to the data to be inspected can be obtained based on the trained target similarity ranking model, and multiple inspection knowledge can be passed
  • the comparison with the clinical test conclusion corresponding to the above-mentioned data to be tested is used to determine whether the clinical test conclusion is abnormal, which is beneficial to improve the efficiency of data abnormality detection.
  • the receiving unit 501 is specifically configured to: obtain multiple sets of clinical data from the clinical data set, where: Any set of clinical data in the multiple sets of clinical data includes: at least one clinical symptom and a clinical test conclusion corresponding to the at least one clinical symptom; multiple sets of test knowledge are acquired from the test knowledge set, wherein the multiple sets of test Any group of test knowledge in the knowledge includes: one disease and at least one symptom test knowledge corresponding to the one disease; the clinical test data set is generated based on the multiple sets of clinical data and the multiple sets of test knowledge.
  • the receiving unit 501 is specifically configured to: obtain from the multiple sets of clinical data Acquire any set of clinical data as target clinical data, calculate multiple Jaccard similarities corresponding to the target clinical data and the multiple sets of test knowledge; obtain the maximum value corresponding to the multiple Jaccard similarities
  • the test knowledge is the target test knowledge, the mapping relationship between the target clinical data and the target test knowledge is generated as the clinical test data, and the above steps are repeated to obtain multiple clinical test data corresponding to the multiple sets of clinical data, and merge the multiple One clinical test data to obtain the clinical test data set.
  • the receiving unit 501 is specifically further configured to: obtain a preset Jaccard A calculation formula is used to obtain any set of test knowledge among the multiple sets of test knowledge, and the target clinical data and any set of test knowledge are used as the input of the preset Jaccard calculation formula to obtain the target clinical data Repeat the above steps with the Jaccard similarity of any set of test knowledge to obtain multiple Jaccard similarities.
  • the training unit 502 is specifically configured to : Using the multiple sets of clinical data as the input of the first attention layer of the preset similarity ranking model; using the multiple sets of inspection knowledge as the input of the second attention layer of the preset similarity ranking model Obtain the clinical feature weighting vectors of the multiple sets of clinical data from the first attention layer, and acquire the knowledge feature weighting vectors of the multiple sets of test knowledge from the second attention layer; convert the clinical feature weighting vectors And the knowledge feature proportion vector is input into the preset similarity ranking model to obtain the multiple sets of clinical data and the multiple sets of test knowledge corresponding embedding vectors, wherein the embedding vector represents the multiple sets of clinical data and The similarity of the multiple sets of test knowledge; training the preset similarity ranking model based on the embedding vector to obtain the trained target similarity ranking model.
  • the judging unit 503 is specifically configured to: The target similarity sorting model of, calculates the similarity matrix of the data to be checked; sorts the similarities contained in the similarity matrix according to the rule from large to small to obtain the sorted target similarity matrix; Determine the first k similarities of the target similarity matrix to obtain k test knowledges corresponding to the k similarities, and the multiple test knowledges include the k test knowledges, wherein the k test knowledges Corresponding to the data to be checked, k is an integer greater than 1.
  • the judging unit 503 is specifically further configured to: obtain the data to be examined Corresponding clinical test conclusions, determine whether the clinical test conclusions include the k test knowledges; if the clinical test conclusions include the k test knowledges, determine whether the clinical test conclusions are consistent with the k test knowledges ; If they are consistent, determine that the clinical test conclusions are in a non-abnormal state; if they are inconsistent, determine that the clinical test conclusions are in an abnormal state; if the clinical test conclusions do not include any of the k test knowledge, determine all The clinical test conclusions are in an abnormal state.
  • An embodiment of the present application also provides a computer-readable storage medium, wherein the computer storage medium stores a computer program for electronic data exchange, and the computer program enables a computer to perform any data abnormality detection as described in the above method embodiments. Part or all of the steps of the method.
  • the computer-readable storage medium may be non-volatile or volatile.
  • the embodiments of the present application also provide a computer program product.
  • the computer program product includes a non-transitory computer-readable storage medium storing a computer program.
  • the computer program is operable to cause a computer to execute the method described in the foregoing method embodiment. Part or all of the steps of any data anomaly detection method.

Abstract

一种数据异常检测方法及装置,应用于服务器,所述方法包括:接收临床数据集和检验知识集,依据临床数据集和检验知识集生成临床检验数据集(101);将临床检验数据集作为预设相似度排序模型的训练数据执行训练操作,得到训练好的目标相似度排序模型(102);接收待检数据,将待检数据输入到训练好的目标相似度排序模型,得到多个检验知识,依据多个检验知识判断待检数据对应的临床检验结论是否处于异常状态(103)。

Description

数据异常检测方法及装置
本申请要求于2020年06月28日提交中国专利局、申请号为202010598054.8、申请名称为“数据异常检测方法及装置”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及数据处理技术领域,具体涉及一种数据异常检测方法及装置。
背景技术
检验检查是临床中的一项重要过程,主要是利用实验室的各项工具,对患者的健康状态和生理功能进行评估,并协助临床医学中的诊断和治疗。发明人发现,通常情况下,检验检查是由医生根据患者的主诉,基于自身的临床经验给出的,因此,整个检验检查过程主观性强,并且由于医生在临床经验上的差异,检验检查得出的结果也不尽相同,容易导致漏检或多检的情况,漏检导致关键性临床指标的缺失,多检导致检验检查流程周期长,因此,目前仍缺少一种对医生的检验检查进行异常检测的方法,导致检验检查的准确率低下,用户体验度不高。
发明内容
基于目前仍缺少一种对医生的检验检查进行异常检测的方法,导致检验检查的准确率低下的问题,本申请实施方式提供一种数据异常检测方法及装置、服务器、计算机可读存储介质,有利于提高数据异常检测的准确率。
本申请实施例第一方面提供了一种数据异常检测方法,应用于服务器,包括:接收临床数据集和检验知识集,依据所述临床数据集和检验知识集生成临床检验数据集;将所述临床检验数据集作为预设相似度排序模型的训练数据执行训练操作,得到训练好的目标相似度排序模型;接收待检数据,将所述待检数据输入到所述训练好的目标相似度排序模型,得到多个检验知识,依据所述多个检验知识判断所述待检数据对应的临床检验结论是否处于异常状态。
本申请实施例第二方面提供了一种数据异常检测装置,应用于服务器,所述装置包括:接收单元、训练单元和判断单元,其中,所述接收单元,用于接收临床数据集和检验知识集,依据所述临床数据集和检验知识集生成临床检验数据集;所述训练单元,用于将所述临床检验数据集作为预设相似度排序模型的训练数据执行训练操作,得到训练好的目标相似度排序模型;所述判断单元,用于接收待检数据,将所述待检数据输入到所述训练好的目标相似度排序模型,得到多个检验知识,依据所述多个检验知识判断所述待检数据对应的临床检验结论是否处于异常状态。
本申请实施例的第三方面提供一种服务器,所述服务器包括处理器、输入设备、输出设备和存储器,所述处理器、输入设备、输出设备和存储器相互连接,其中,所述存储器用于存储计算机程序,所述计算机程序包括程序指令,所述处理器被配置用于调用所述程序指令,其中:接收临床数据集和检验知识集,依据所述临床数据集和检验知识集生成临床检验数据集;将所述临床检验数据集作为预设相似度排序模型的训练数据执行训练操作, 得到训练好的目标相似度排序模型;接收待检数据,将所述待检数据输入到所述训练好的目标相似度排序模型,得到多个检验知识,依据所述多个检验知识判断所述待检数据对应的临床检验结论是否处于异常状态。
本申请实施例的第四方面提供了一种计算机可读存储介质,其中,上述计算机可读存储介质存储用于电子数据交换的计算机程序,其中,上述计算机程序使得计算机执行以下步骤:接收临床数据集和检验知识集,依据所述临床数据集和检验知识集生成临床检验数据集;将所述临床检验数据集作为预设相似度排序模型的训练数据执行训练操作,得到训练好的目标相似度排序模型;接收待检数据,将所述待检数据输入到所述训练好的目标相似度排序模型,得到多个检验知识,依据所述多个检验知识判断所述待检数据对应的临床检验结论是否处于异常状态。
本申请实施例的第五方面提供了一种计算机程序产品,其中,上述计算机程序产品包括存储了计算机程序的非瞬时性计算机可读存储介质,上述计算机程序可操作来使计算机执行如本申请实施例第一方面所描述的部分或全部步骤。该计算机程序产品可以为一个软件安装包。
实施本申请实施例,至少具有如下有益效果:通过本申请实施例,可基于训练好的目标相似度排序模型得到待检数据对应的多个检验知识,并通过多个检验知识与上述待检数据对应的临床检验结论进行对比,以判断该临床检验结论是否为异常状态,有利于提高数据异常检测效率。
附图说明
[根据细则91更正 03.11.2020] 
图1A为本申请实施例提供了一种数据异常检测方法的流程示意图。
图1B为本申请实施例提供了一种预设相似度排序模型的网络结构示意图。
图1C为本申请实施例提供了一种数据异常检测方法的网络流程示意图。
图2为本申请实施例提供了一种数据异常检测方法的流程示意图。
图3为本申请实施例提供了一种数据异常检测方法的流程示意图。
图4为本申请实施例提供了一种服务器的结构示意图。
图5为本申请实施例提供了一种数据异常检测装置的结构示意图。
具体实施方式
为了能够更好地理解本申请实施例,下面将对应用本申请实施例的方法进行介绍。
本申请实施例中提到的服务器可以包括但不限于后台服务器、组件服务器、云端服务器、数据分配系统服务器或数据分配软件服务器等,上述仅是举例,而非穷举,包含但不限于上述装置。
请参见图1A,图1A是本申请实施例提供的一种数据异常检测方法的流程示意图,应用于服务器,上述方法包括以下步骤101-103。
101、接收临床数据集和检验知识集,依据所述临床数据集和检验知识集生成临床检验数据集。
其中,在临床上,检验检查一项重要过程,其主要是利用实验室的各项工具,对患者 的健康状态和生理功能进行评估,并协助临床医学中的诊断和治疗。在本申请实施例中,上述临床数据集中可包括多组临床数据,上述检验知识集可包括多组检验知识,上述临床检验数据集可包括多个临床检验数据,上述多个临床检验数据中任意一个临床检验数据可包括一个临床数据与一个检验知识。在实际应用中,上述临床数据集可基于历史临床数据得到,可包括患者的临床症状和医生实际开出的检验检查结果等等,例如,患者的病症、血常规数据、尿常规数据、心电图等等,在此不作限定;上述检验知识可包括各种症状以及其对应的常见检验检查,例如,急性呼吸道感染病毒和细菌诊断的方法以及指导建议等等、血液检验中血涂片的制备以及染色,或者提取量以及其他检验标准等等。
可选地,为了保证用户的医疗数据的安全性,可将上述临床数据集、检验知识集合临床检验数据集存储于区块链的节点中,需要说明的是,本申请实施例所指区块链是分布式数据存储、点对点传输、共识机制、加密算法等计算机技术的新型应用模式。区块链(Blockchain),本质上是一个去中心化的数据库,是一串使用密码学方法相关联产生的数据块,每一个数据块中包含了一批次网络交易的信息,用于验证其信息的有效性(防伪)和生成下一个区块。区块链可以包括区块链底层平台、平台产品服务层以及应用服务层等。
在一种可能的示例中,上述步骤101,依据所述临床数据集和检验知识集生成临床检验数据集,包括11-13。
11、从所述临床数据集中获取多组临床数据,其中,所述多组临床数据中任意一组临床数据包括:至少一个临床症状和所述至少一个临床症状对应的临床检验结论。
12、从所述检验知识集中获取多组检验知识,其中,所述多组检验知识中任意一组检验知识包括:一个病症以及所述一个病症对应的至少一个症状检验知识。
13、依据所述多组临床数据与所述多组检验知识生成所述临床检验数据集。
其中,上述病症可包括以下至少一项:发热、发冷、咳嗽、流鼻涕、咽喉红肿等等,在此不作限定。
其中,上述由临床数据集和检验知识集生成的临床检验数据集可作为后续模型训练操作的训练集,有利于后续的对临床检验结论的判断操作的推进。
在一种可能的示例中,上述步骤13,依据所述多组临床数据与所述多组检验知识生成所述临床检验数据集,可包括如下步骤131-132。
131、从所述多组临床数据中获取任意一组临床数据作为目标临床数据,计算所述目标临床数据与所述多组检验知识对应的多个杰卡德相似度。
132、获取所述多个杰卡德相似度中的最大值对应的检验知识为目标检验知识,生成所述目标临床数据与所述目标检验知识的映射关系作为临床检验数据,重复上述步骤,得到所述多组临床数据对应的多个临床检验数据,合并所述多个临床检验数据得到所述临床检验数据集。
其中,上述杰卡德相似度是用于评判检验知识与临床数据两个数据集合之间的相似度的一种指标。
其中,由于上述多组临床数据与多组检验知识的数据量是较大的,并且是多种多样的, 为了减少后续模型训练的工作量,以及提高后续临床检验结论的准确性,可对上述多组临床数据以及多个临床检验数据进行评估,以计算得到多个杰卡德相似度,通过杰卡德相似度对上述多组检验知识以及多组临床数据进行筛选,以得到后续用于模型训练的临床检验数据集。
具体实现中,可计算多组临床数据中的任意一组临床数据分别与每组检验知识对应的杰卡德相似度,以得到多个杰卡德相似度,选取多个杰卡德相似度中的最大值对应检验知识为目标检验知识;重复上述步骤,可得到多组临床数据对应的多个临床检验数据,上述多个临床检验数据则组成上述临床检验数据集,如此,可得到用于后续模型训练的临床检验数据集。
在一种可能的示例中,上述步骤131,计算所述目标临床数据与所述多组检验知识对应的多个杰卡德相似度,可包括如下步骤:
获取预设杰卡德计算公式,获取所述多组检验知识中任意一组检验知识,将所述目标临床数据与所述任意一组检验知识作为所述预设杰卡德计算公式的输入,得到所述目标临床数据与所述任意一组检验知识的杰卡德相似度,重复上述步骤,得到多个杰卡德相似度。
其中,上述预设杰卡德计算公式可由用户自行设置或者系统默认,在此不作限定。
举例来说,上述预设杰卡德计算公式可设置为:J(A,B)=(|A∩B|)/(|A∪B|),其中,A表示目标临床数据,B表示所述任意一组检验知识,J(A,B)表示目标临床数据A与任意一组检验知识B的杰卡德相似度;也就是说,可得到两组数据(目标临床数据A与检验知识B)相同检验检查的个数与两组数据(目标临床数据A与检验知识B)不重复元素个数的商;如此,可得到目标临床数据与多组检验知识之间的多个杰卡德相似度,并重复上述步骤,可得到多组临床数据对应的多个检验知识,以得到临床检验数据集。
102、将所述临床检验数据集作为预设相似度排序模型的训练数据执行训练操作,得到训练好的目标相似度排序模型。
其中,上述预设相似度排序模型可为用户自行设置或者系统默认,在此不作限定;例如,上述预设相似度排序模型可为深度神经网络(Deep Neural Network,DNN)。
其中,在本申请实施例中,可将上述临床检验数据集作为训练数据,以用于预设相似度排序模型训练,使得训练好的目标相似度排序模型可能在大方向上对与输入最相似的项进行优化,有利于本申请实施例中对最相似的多个临床数据对应的多个检验知识进行优化的需求。
在一种可能的示例中,上述步骤102,将所述临床检验数据集作为预设相似度排序模型的训练数据执行训练操作,得到训练好的目标相似度排序模型,可包括如下步骤21-25。
21、将所述多组临床数据作为所述预设相似度排序模型的第一注意力层的输入。
22、将所述多组检验知识作为所述预设相似度排序模型的第二注意力层的输入。
23、从所述第一注意力层获取所述多组临床数据的临床特征比重向量,从所述第二注意力层获取所述多组检验知识的知识特征比重向量。
24、将所述临床特征比重向量和所述知识特征比重向量输入所述预设相似度排序模型, 得到所述多组临床数据和所述多组检验知识对应的嵌入向量,其中,所述嵌入向量表示所述多组临床数据和所述多组检验知识的相似度。
25、基于所述嵌入向量对所述预设相似度排序模型进行训练,得到所述训练好的所述目标相似度排序模型。
其中,如图1B所示,图1B为本申请实施例提供的一种预设相似度排序模型的网络结构示意图,上述预设相似度排序模型可包括DNN网络模型,上述预设相似度排序模型还可包括第一注意力层和第二注意力层。
具体实现中,服务器可将多组临床数据(例如X1=(A1i,B1i,E1i,G1i,F1i))作为上述预设相似度排序模型的第一注意力层的输入,其中,该第一注意力层包含与多组临床数据维度相同的第一向量,并且第一注意力层通过模型训练调整第一向量中向量元素的取值,第一向量中任意一个向量元素的取值表示基于DNN的相似度排序模型对任意一个向量元素的重要度,任意一个向量元素的取值越高,表示于DNN的相似度排序模型对任意一个向量元素的重要度越大,反之则越小。
进一步地,可将多组检验知识(例如X2=(A2i,B2i,E2i,G2i,F2i))作为预设相似度排序模型的第二注意力层的输入,第二注意力层中包含与多组检验知识维度相同的第二向量,并且第二注意力层通过模型训练调整第二向量中向量元素的取值,从第一注意力层的获取多组临床数据的临床特征比重向量,从第二注意力层获取多组检验知识的知识特征比重向量,将临床特征比重向量和知识特征比重向量分别输入上述DNN网络,并执行模型训练操作,得到多组临床数据和多组检验知识的嵌入向量,其中,该嵌入向量表示多组临床数据和多组检验知识的相似度。
再进一步地,可基于嵌入向量对预设相似度排序模型进行更新,并得到最终训练好的目标相似度排序模型,具体地,可获取嵌入向量余弦损失函数,并通过该嵌入向量余弦损失函数约束嵌入向量的分布,使多组临床数据和多组检验知识中相互匹配的临床数据和检验知识对应嵌入向量的余弦相似度更大,使不匹配的临床数据和检验知识对应的嵌入向量的余弦相似度更小;可获取预设的TopkRankLoss损失函数,嵌入预设的TopkRankLoss损失函数,对预设相似度排序模型进行更新,得到更新后的相似度排序模型,最后,可输入嵌入向量执行模型训练操作,得到训练好的目标相似度排序模型。
其中,上述嵌入向量余弦损失函数可由用户自行设置或者系统默认,在此不作限定;上述预设的TopkRankLoss损失函数可由用户自行设置或者系统默认,在此不作限定,例如,该预设TopkRankLoss损失函数可包括:
Figure PCTCN2020118419-appb-000001
其中,x i是上述训练好的目标相似度排序模型针对该临床检验数据集中任意一个临床 检验数据计算得到的相似度列表中的任意一项,
Figure PCTCN2020118419-appb-000002
是x i在相似度列表中的位置,maxK是相似度列表的总长度,positive、simi_positive、negative表示该临床检验数据中的临床数据与检验知识匹配、近似匹配和不匹配。
也就是说,当临床数据与检验知识匹配时,该检验知识位于相似度列表中的第一位,产生的损失为0,不需要修正;当临床数据与检验知识处于近似匹配关系或不匹配关系时,该检验知识位于相似度列表中第k位,k为大于1的整数,当临床数据与检验知识数据的相似度越低时k的取值越大则产生的损失越大。
可以看出,在本申请实施例中,上述算法针对临床检验检查质控场景中对与患者最相似的若干个检验检查项的需求,引入了预设的TopkRankLoss损失函数;在训练过程中,又引入了向量余弦损失函数,并同时使用嵌入向量余弦损失函数和预设的TopkRankLoss损失函数进行交替训练的机制,使训练好的目标相似度排序模型既能在大方向上对与输入最相似的项进行优化,同时也能符合本申请实施例的场景中对最相似的若干个检验检查项的需求,解决了现有方法只能对最相似项进行优化的局限性问题。
另外,本申请实施例在预设相似度排序模型中引入注意力机制,使模型可以在训练过程中学习到临床数据与检验知识中各特征的重要性,从而为最终的临床检验结果提供解释性,解决了神经网络模型可解释性差的问题。
103、接收待检数据,将所述待检数据输入到所述训练好的目标相似度排序模型,得到多个检验知识,依据所述多个检验知识判断所述待检数据对应的临床检验结论是否处于异常状态。
其中,上述待检数据可包括待检临床数据,该临床数据可包括病症、血常规数据、尿常规数据、心电图等等,在此不作限定。
具体实现中,可将该待检数据输入到训练好的目标相似度排序模型中,以生成相似度列表,并得到多个检验知识,服务器可进一步根据该多个检验知识,确定其在相似度列表中的位置,并根据该位置,确定上述待检数据对应的临床检验结论是否处于异常状态,也就是说,可对该临床检验结论进行质检,以判断该临床检验结论的准确性。
可以看出,在本申请实施例中,当医生在面临复杂病情从而较难给出完整且精确的检验检查结果时,本申请实施例可为其提供质量保护,从而减少其工作量,提高工作效率。
在一种可能的示例中,上述步骤103,将所述待检数据输入到所述训练好的目标相似度排序模型,得到多个检验知识,可包括如下步骤311-313。
311、基于所述训练好的目标相似度排序模型,对所述待检数据进行计算得到相似度矩阵;
312、按照从大到小的规则对所述相似度矩阵中包含的相似度进行排序,得到排序后的目标相似度矩阵;
313、确定所述目标相似度矩阵的前k个相似度,得到所述k个相似度对应的k个检验知识,所述多个检验知识包括所述k个检验知识其中,所述k个检验知识与所述待检数据 对应,k为大于1的整数。
具体实现中,服务器将上述待检数据输入上述目标相似度排序模型,以得到该待检数据对应的相似度矩阵,该相似度矩阵中包括该待检数据与目标相似度排序模型中多个检验数据的杰卡德相似度。
进一步地,为了提高数据异常检测的效率,可对该相似度矩阵中包含的相似度按照一定的规则进行排序,例如,可按照从大到小进行排序,以生成排序以后的目标相似度矩阵;由于相似度越大,则表明匹配性越高,则一般的,为了提高检测效率,可确定目标相似度矩阵中的前k个相似度,并确定该k个相似度对应的k个检验知识即为上述多个检验知识,例如可选取前5个相似度对应的检验知识作为下述临床检验结论是否处于异常状态的评判标准,其中,k为大于1的整数。
在一种可能的示例中,上述步骤103,依据所述多个检验知识判断所述待检数据对应的临床检验结论是否处于异常状态,可包括如下步骤321-323。
321、获取所述待检数据对应的临床检验结论,判断所述临床检验结论是否包含所述k个检验知识。
323、若所述临床检验结论包含所述k个检验知识,判断所述临床检验结论与所述k个检验知识是否一致;若一致,确定所述临床检验结论处于非异常状态;若不一致,确定所述临床检验结论处于异常状态。
323、若所述临床检验结论不包含所述k个检验知识中任意一个检验知识,确定所述临床检验结论处于异常状态。
其中,服务器可基于多个检验知识和上述待检数据对应的临床检验结论,确定该临床检验结论是否处于异常状态,该异常状态可包括以下至少一种:多检状态和漏检状态等等;具体地,当上述临床检验结论中包括k个检验知识时,且上述临床检验结论与k个检验知识是不一致的,则可确定该临床检验结论处于异常状态中的多检状态;若上述临床检验结论不包含k个检验知识中任意一个检验知识,确定该临床检验结论处于异常状态中的漏检状态;若上述临床结论包括k个检验知识且与k个检验知识是一致的,则确定该临床检验结论为非异常状态。
如图1C所示,为一种数据异常检测方法的网络流程示意图;服务器可将多组临床数据以及多组检验数据分别输入预设相似度排序模型的第一注意力层和第二注意力层,并分别输出多组临床数据的临床特征比重向量,以及多组检验知识的知识特征比重向量,并将二者输入DNN网络模型中,得到多组临床数据和多组检验知识对应的嵌入向量,该嵌入向量表示所述多组临床数据和所述多组检验知识的相似度;进而,可获取嵌入向量余弦损失函数,并通过该嵌入向量余弦损失函数约束上述嵌入向量的分布,并获取预设的TopkRankLoss损失函数;进而可基于该预设的TopkRankLoss损失函数对预设的相似度排序模型进行更新,基于嵌入向量对该更新以后的模型进行训练,以得到训练好的目标相似度排序模型;如此,在预设相似度排序模型中引入注意力机制,使模型可以在训练过程中学习到临床数据与检验知识中各特征的重要性,从而为最终的临床检验结果提供解释性,解决了神经网 络模型可解释性差的问题。
最后,服务器可将该待检数据输入目标相似度排序模型中,并得到相似度列表,并基于该相似度列表,确定该待检数据对应的多个检验知识,最后通过将多个检验知识与待检数据对应的临床检验结果进行对比,以判断该临床检验结果是否处于异常状态;如此,有利于提高临床数据检验的效率。
可以看出,本申请实施例中所描述的数据异常检测方法,应用于服务器,可接收临床数据集和检验知识集,依据临床数据集和检验知识集生成临床检验数据集,将临床检验数据集作为预设相似度排序模型的训练数据执行训练操作,得到训练好的目标相似度排序模型;接收待检数据,将待检数据输入到训练好的目标相似度排序模型,得到多个检验知识,依据多个检验知识判断待检数据对应的临床检验结论是否处于异常状态;如此,可基于训练好的目标相似度排序模型得到待检数据对应的多个检验知识,并通过多个检验知识与上述待检数据对应的临床检验结论进行对比,以判断该临床检验结论是否为异常状态,有利于提高数据异常检测效率。
与上述一致地,请参阅图2,图2是本申请实施例公开的一种数据异常检测方法的流程示例图,应用于服务器,该数据异常检测方法可包括如下步骤201-209。
201、接收临床数据集和检验知识集,从所述临床数据集中获取多组临床数据,其中,所述多组临床数据中任意一组临床数据包括:至少一个临床症状和所述至少一个临床症状对应的临床检验结论。
202、从所述检验知识集中获取多组检验知识,其中,所述多组检验知识中任意一组检验知识包括:一个病症以及所述一个病症对应的至少一个症状检验知识。
203、依据所述多组临床数据与所述多组检验知识生成所述临床检验数据集。
204、将所述多组临床数据作为所述预设相似度排序模型的第一注意力层的输入。
205、将所述多组检验知识作为所述预设相似度排序模型的第二注意力层的输入。
206、从所述第一注意力层获取所述多组临床数据的临床特征比重向量,从所述第二注意力层获取所述多组检验知识的知识特征比重向量。
207、将所述临床特征比重向量和所述知识特征比重向量输入所述预设相似度排序模型,得到所述多组临床数据和所述多组检验知识对应的嵌入向量,其中,所述嵌入向量表示所述多组临床数据和所述多组检验知识的相似度。
208、基于所述嵌入向量对所述预设相似度排序模型进行训练,得到所述训练好的所述目标相似度排序模型。
209、接收待检数据,将所述待检数据输入到所述训练好的目标相似度排序模型,得到多个检验知识,依据所述多个检验知识判断所述待检数据对应的临床检验结论是否处于异常状态。
其中,上述步骤201-步骤209所描述的数据异常检测方法可参考图1A所描述的数据异常检测方法的对应步骤。
可以看出,本申请实施例所描述的数据异常检测方法,应用于服务器,可接收临床数 据集和检验知识集,从所述临床数据集中获取多组临床数据,其中,多组临床数据中任意一组临床数据包括:至少一个临床症状和至少一个临床症状对应的临床检验结论;从检验知识集中获取多组检验知识,其中,多组检验知识中任意一组检验知识包括:一个病症以及一个病症对应的至少一个症状检验知识;依据多组临床数据与多组检验知识生成临床检验数据集;将多组临床数据作为预设相似度排序模型的第一注意力层的输入;将多组检验知识作为预设相似度排序模型的第二注意力层的输入;从第一注意力层获取多组临床数据的临床特征比重向量,从第二注意力层获取多组检验知识的知识特征比重向量;将临床特征比重向量和知识特征比重向量输入预设相似度排序模型,得到多组临床数据和多组检验知识对应的嵌入向量,其中,嵌入向量表示多组临床数据和多组检验知识的相似度;基于嵌入向量对预设相似度排序模型进行训练,得到训练好的目标相似度排序模型;接收待检数据,将待检数据输入到训练好的目标相似度排序模型,得到多个检验知识,依据多个检验知识判断待检数据对应的临床检验结论是否处于异常状态;如此,服务器可基于临床数据集和检验知识对上述预设相似度排序模型进行训练,也就是说,充分使用了临床检验检查质控场景中对与患者最相似的若干个检验检查项的需求,有利于提高模型进行相似度计算的准确性,并在模型训练的过程中引入了注意力机制,使模型可以在训练过程中学习到临床数据与检验知识中各特征的重要性,从而为最终的临床检验结果提供解释性,解决了神经网络模型可解释性差的问题;最后,可根据目标相似度排序模型对上述临床检验结论进行判断评估,确定该临床检验结论是否异常,有利于提高数据异常检测的效率。
与上述一致地,请参阅图3,图3是本申请实施例公开的一种数据异常检测方法的流程示例图,应用于服务器,该数据异常检测方法可包括如下步骤301-306。
301、接收临床数据集和检验知识集,依据所述临床数据集和检验知识集生成临床检验数据集。
302、将所述临床检验数据集作为预设相似度排序模型的训练数据执行训练操作,得到训练好的目标相似度排序模型。
303、接收待检数据,将所述待检数据输入到所述训练好的目标相似度排序模型,得到多个检验知识,所述多个检验知识包括所述k个检验知识,其中,所述k个检验知识与所述待检数据对应,k为大于1的整数。
304、获取所述待检数据对应的临床检验结论,判断所述临床检验结论是否包含所述k个检验知识。
305、若所述临床检验结论包含所述k个检验知识,判断所述临床检验结论与所述k个检验知识是否一致;若一致,确定所述临床检验结论处于非异常状态;若不一致,确定所述临床检验结论处于异常状态。
306、若所述临床检验结论不包含所述k个检验知识中任意一个检验知识,确定所述临床检验结论处于异常状态。
其中,上述步骤301-步骤306所描述的数据异常检测方法可参考图1A所描述的数据异常检测方法的对应步骤。
可以看出,本申请实施例所描述的数据异常检测方法,应用于服务器,可接收临床数据集和检验知识集,依据临床数据集和检验知识集生成临床检验数据集;将临床检验数据集作为预设相似度排序模型的训练数据执行训练操作,得到训练好的目标相似度排序模型;接收待检数据,将待检数据输入到训练好的目标相似度排序模型,得到多个检验知识,多个检验知识包括k个检验知识,其中,k个检验知识与待检数据对应,k为大于1的整数;获取待检数据对应的临床检验结论,判断临床检验结论是否包含k个检验知识;若临床检验结论包含k个检验知识,判断临床检验结论与k个检验知识是否一致;若一致,确定临床检验结论处于非异常状态;若不一致,确定临床检验结论处于异常状态;若临床检验结论不包含k个检验知识中任意一个检验知识,确定临床检验结论处于异常状态;如此,可基于训练好的目标相似度排序模型确定待检数据对应的相似度靠前的k个检验知识,并通过多个检验知识与上述待检数据对应的临床检验结论进行对比,以判断该临床检验结论是否为异常状态,有利于提高数据异常检测效率。
与上述一致地,请参阅图4,图4为本申请实施例提供的一种服务器的结构示意图,如图4所示,包括处理器、通信接口、存储器以及一个或多个程序,所述处理器、通信接口和存储器相互连接,其中,所述存储器用于存储计算机程序,所述计算机程序包括程序指令,所述处理器被配置用于调用所述程序指令,上述一个或多个程序程序包括用于执行以下步骤的指令:接收临床数据集和检验知识集,依据所述临床数据集和检验知识集生成临床检验数据集;将所述临床检验数据集作为预设相似度排序模型的训练数据执行训练操作,得到训练好的目标相似度排序模型;接收待检数据,将所述待检数据输入到所述训练好的目标相似度排序模型,得到多个检验知识,依据所述多个检验知识判断所述待检数据对应的临床检验结论是否处于异常状态。
可以看出,本申请实施例中所描述的服务器,可接收临床数据集和检验知识集,依据临床数据集和检验知识集生成临床检验数据集,将临床检验数据集作为预设相似度排序模型的训练数据执行训练操作,得到训练好的目标相似度排序模型;接收待检数据,将待检数据输入到训练好的目标相似度排序模型,得到多个检验知识,依据多个检验知识判断待检数据对应的临床检验结论是否处于异常状态;如此,可基于训练好的目标相似度排序模型得到待检数据对应的多个检验知识,并通过多个检验知识与上述待检数据对应的临床检验结论进行对比,以判断该临床检验结论是否为异常状态,有利于提高数据异常检测效率。
在一个可能的示例中,在所述依据所述临床数据集和检验知识集生成临床检验数据集方面,所述程序用于执行以下步骤的指令:从所述临床数据集中获取多组临床数据,其中,所述多组临床数据中任意一组临床数据包括:至少一个临床症状和所述至少一个临床症状对应的临床检验结论;从所述检验知识集中获取多组检验知识,其中,所述多组检验知识中任意一组检验知识包括:一个病症以及所述一个病症对应的至少一个症状检验知识;依据所述多组临床数据与所述多组检验知识生成所述临床检验数据集。
在一个可能的示例中,在所述依据所述多组临床数据与所述多组检验知识生成所述临床检验数据集方面,所述程序用于执行以下步骤的指令:从所述多组临床数据中获取任意 一组临床数据作为目标临床数据,计算所述目标临床数据与所述多组检验知识对应的多个杰卡德相似度;获取所述多个杰卡德相似度中的最大值对应的检验知识为目标检验知识,生成所述目标临床数据与所述目标检验知识的映射关系作为临床检验数据,重复上述步骤,得到所述多组临床数据对应的多个临床检验数据,合并所述多个临床检验数据得到所述临床检验数据集。
在一个可能的示例中,在所述计算所述目标临床数据与所述多组检验知识对应的多个杰卡德相似度方面,所述程序用于执行以下步骤的指令:获取预设杰卡德计算公式,获取所述多组检验知识中任意一组检验知识,将所述目标临床数据与所述任意一组检验知识作为所述预设杰卡德计算公式的输入,得到所述目标临床数据与所述任意一组检验知识的杰卡德相似度,重复上述步骤,得到多个杰卡德相似度。
在一个可能的示例中,在所述将所述临床检验数据集作为预设相似度排序模型的训练数据执行训练操作,得到训练好的目标相似度排序模型方面,所述程序用于执行以下步骤的指令:将所述多组临床数据作为所述预设相似度排序模型的第一注意力层的输入;将所述多组检验知识作为所述预设相似度排序模型的第二注意力层的输入;从所述第一注意力层获取所述多组临床数据的临床特征比重向量,从所述第二注意力层获取所述多组检验知识的知识特征比重向量;将所述临床特征比重向量和所述知识特征比重向量输入所述预设相似度排序模型,得到所述多组临床数据和所述多组检验知识对应的嵌入向量,其中,所述嵌入向量表示所述多组临床数据和所述多组检验知识的相似度;基于所述嵌入向量对所述预设相似度排序模型进行训练,得到所述训练好的所述目标相似度排序模型。
在一个可能的示例中,在所述将所述待检数据输入到所述训练好的目标相似度排序模型,得到多个检验知识方面,所述程序用于执行以下步骤的指令:基于所述训练好的目标相似度排序模型,对所述待检数据进行计算得到相似度矩阵;按照从大到小的规则对所述相似度矩阵中包含的相似度进行排序,得到排序后的目标相似度矩阵;确定所述目标相似度矩阵的前k个相似度,得到所述k个相似度对应的k个检验知识,所述多个检验知识包括所述k个检验知识,其中,所述k个检验知识与所述待检数据对应,k为大于1的整数。
在一个可能的示例中,在所述依据所述多个检验知识判断所述待检数据对应的临床检验结论是否处于异常状态方面,所述程序用于执行以下步骤的指令:获取所述待检数据对应的临床检验结论,判断所述临床检验结论是否包含所述k个检验知识;若所述临床检验结论包含所述k个检验知识,判断所述临床检验结论与所述k个检验知识是否一致;若一致,确定所述临床检验结论处于非异常状态;若不一致,确定所述临床检验结论处于异常状态;若所述临床检验结论不包含所述k个检验知识中任意一个检验知识,确定所述临床检验结论处于异常状态。
上述主要从方法侧执行过程的角度对本申请实施例的方案进行了介绍。可以理解的是,服务器为了实现上述功能,其包含了执行各个功能相应的硬件结构和/或软件模块。本领域技术人员应该很容易意识到,结合本文中所提供的实施例描述的各示例的单元及算法步骤,本申请能够以硬件或硬件和计算机软件的结合形式来实现。某个功能究竟以硬件还是计算 机软件驱动硬件的方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用使用不同方法来实现所描述的功能,但是这种实现不应认为超出本申请的范围。
本申请实施例可以根据上述方法示例对服务器进行功能单元的划分,例如,可以对应各个功能划分各个功能单元,也可以将两个或两个以上的功能集成在一个处理单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用软件功能单元的形式实现。需要说明的是,本申请实施例中对单元的划分是示意性的,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式。
与上述一致地,请参阅图5,图5是本申请实施例公开的一种数据异常检测装置的结构示意图,应用于服务器,该装置包括:接收单元501、训练单元502和判断单元503,其中,所述接收单元501,用于接收临床数据集和检验知识集,依据所述临床数据集和检验知识集生成临床检验数据集;所述训练单元502,用于将所述临床检验数据集作为预设相似度排序模型的训练数据执行训练操作,得到训练好的目标相似度排序模型;所述判断单元503,用于接收待检数据,将所述待检数据输入到所述训练好的目标相似度排序模型,得到多个检验知识,依据所述多个检验知识判断所述待检数据对应的临床检验结论是否处于异常状态。
可以看出,本申请实施例中所描述的数据异常检测装置,应用于服务器,该装置可接收临床数据集和检验知识集,依据临床数据集和检验知识集生成临床检验数据集,将临床检验数据集作为预设相似度排序模型的训练数据执行训练操作,得到训练好的目标相似度排序模型;接收待检数据,将待检数据输入到训练好的目标相似度排序模型,得到多个检验知识,依据多个检验知识判断待检数据对应的临床检验结论是否处于异常状态;如此,可基于训练好的目标相似度排序模型得到待检数据对应的多个检验知识,并通过多个检验知识与上述待检数据对应的临床检验结论进行对比,以判断该临床检验结论是否为异常状态,有利于提高数据异常检测效率。
在一个可能的示例中,在所述依据所述临床数据集和检验知识集生成临床检验数据集方面,所述接收单元501具体用于:从所述临床数据集中获取多组临床数据,其中,所述多组临床数据中任意一组临床数据包括:至少一个临床症状和所述至少一个临床症状对应的临床检验结论;从所述检验知识集中获取多组检验知识,其中,所述多组检验知识中任意一组检验知识包括:一个病症以及所述一个病症对应的至少一个症状检验知识;依据所述多组临床数据与所述多组检验知识生成所述临床检验数据集。
在一个可能的示例中,在所述依据所述多组临床数据与所述多组检验知识生成所述临床检验数据集方面,所述接收单元501具体用于:从所述多组临床数据中获取任意一组临床数据作为目标临床数据,计算所述目标临床数据与所述多组检验知识对应的多个杰卡德相似度;获取所述多个杰卡德相似度中的最大值对应的检验知识为目标检验知识,生成所述目标临床数据与所述目标检验知识的映射关系作为临床检验数据,重复上述步骤,得到所述多组临床数据对应的多个临床检验数据,合并所述多个临床检验数据得到所述临床检 验数据集。
在一个可能的示例中,在所述计算所述目标临床数据与所述多组检验知识对应的多个杰卡德相似度方面,所述接收单元501具体还用于:获取预设杰卡德计算公式,获取所述多组检验知识中任意一组检验知识,将所述目标临床数据与所述任意一组检验知识作为所述预设杰卡德计算公式的输入,得到所述目标临床数据与所述任意一组检验知识的杰卡德相似度,重复上述步骤,得到多个杰卡德相似度。
在一个可能的示例中,在所述将所述临床检验数据集作为预设相似度排序模型的训练数据执行训练操作,得到训练好的目标相似度排序模型方面,所述训练单元502具体用于:将所述多组临床数据作为所述预设相似度排序模型的第一注意力层的输入;将所述多组检验知识作为所述预设相似度排序模型的第二注意力层的输入;从所述第一注意力层获取所述多组临床数据的临床特征比重向量,从所述第二注意力层获取所述多组检验知识的知识特征比重向量;将所述临床特征比重向量和所述知识特征比重向量输入所述预设相似度排序模型,得到所述多组临床数据和所述多组检验知识对应的嵌入向量,其中,所述嵌入向量表示所述多组临床数据和所述多组检验知识的相似度;基于所述嵌入向量对所述预设相似度排序模型进行训练,得到所述训练好的所述目标相似度排序模型。
在一个可能的示例中,在所述将所述待检数据输入到所述训练好的目标相似度排序模型,得到多个检验知识方面,所述判断单元503具体用于:基于所述训练好的目标相似度排序模型,对所述待检数据进行计算得到相似度矩阵;按照从大到小的规则对所述相似度矩阵中包含的相似度进行排序,得到排序后的目标相似度矩阵;确定所述目标相似度矩阵的前k个相似度,得到所述k个相似度对应的k个检验知识,所述多个检验知识包括所述k个检验知识,其中,所述k个检验知识与所述待检数据对应,k为大于1的整数。
在一个可能的示例中,在所述依据所述多个检验知识判断所述待检数据对应的临床检验结论是否处于异常状态方面,所述判断单元503具体还用于:获取所述待检数据对应的临床检验结论,判断所述临床检验结论是否包含所述k个检验知识;若所述临床检验结论包含所述k个检验知识,判断所述临床检验结论与所述k个检验知识是否一致;若一致,确定所述临床检验结论处于非异常状态;若不一致,确定所述临床检验结论处于异常状态;若所述临床检验结论不包含所述k个检验知识中任意一个检验知识,确定所述临床检验结论处于异常状态。
本申请实施例还提供一种计算机可读存储介质,其中,该计算机存储介质存储用于电子数据交换的计算机程序,该计算机程序使得计算机执行如上述方法实施例中记载的任何一种数据异常检测方法的部分或全部步骤。所述计算机可读存储介质可以是非易失性,也可以是易失性。
本申请实施例还提供一种计算机程序产品,所述计算机程序产品包括存储了计算机程序的非瞬时性计算机可读存储介质,所述计算机程序可操作来使计算机执行如上述方法实施例中记载的任何一种数据异常检测方法的部分或全部步骤。
以上所述,仅为本申请的具体实施方式,但本申请的保护范围并不局限于此,任何熟 悉本技术领域的技术人员在本申请揭露的技术范围内,可轻易想到变化或替换,都应涵盖在本申请的保护范围之内。因此,本申请的保护范围应以所述权利要求的保护范围为准。

Claims (20)

  1. 一种数据异常检测方法,其中,应用于服务器,包括:
    接收临床数据集和检验知识集,依据所述临床数据集和检验知识集生成临床检验数据集;
    将所述临床检验数据集作为预设相似度排序模型的训练数据执行训练操作,得到训练好的目标相似度排序模型;
    接收待检数据,将所述待检数据输入到所述训练好的目标相似度排序模型,得到多个检验知识,依据所述多个检验知识判断所述待检数据对应的临床检验结论是否处于异常状态。
  2. 根据权利要求1所述的方法,其中,所述依据所述临床数据集和检验知识集生成临床检验数据集,包括:
    从所述临床数据集中获取多组临床数据,其中,所述多组临床数据中任意一组临床数据包括:至少一个临床症状和所述至少一个临床症状对应的临床检验结论;
    从所述检验知识集中获取多组检验知识,其中,所述多组检验知识中任意一组检验知识包括:一个病症以及所述一个病症对应的至少一个症状检验知识;
    依据所述多组临床数据与所述多组检验知识生成所述临床检验数据集。
  3. 根据权利要求2所述的方法,其中,所述依据所述多组临床数据与所述多组检验知识生成所述临床检验数据集,包括:
    从所述多组临床数据中获取任意一组临床数据作为目标临床数据,计算所述目标临床数据与所述多组检验知识对应的多个杰卡德相似度;
    获取所述多个杰卡德相似度中的最大值对应的检验知识为目标检验知识,生成所述目标临床数据与所述目标检验知识的映射关系作为临床检验数据,重复上述步骤,得到所述多组临床数据对应的多个临床检验数据,合并所述多个临床检验数据得到所述临床检验数据集。
  4. 根据权利要求3所述的方法,其中,所述计算所述目标临床数据与所述多组检验知识对应的多个杰卡德相似度,包括:
    获取预设杰卡德计算公式,获取所述多组检验知识中任意一组检验知识,将所述目标临床数据与所述任意一组检验知识作为所述预设杰卡德计算公式的输入,得到所述目标临床数据与所述任意一组检验知识的杰卡德相似度,重复上述步骤,得到多个杰卡德相似度。
  5. 根据权利要求2所述的方法,其中,所述将所述临床检验数据集作为预设相似度排序模型的训练数据执行训练操作,得到训练好的目标相似度排序模型,包括:
    将所述多组临床数据作为所述预设相似度排序模型的第一注意力层的输入;
    将所述多组检验知识作为所述预设相似度排序模型的第二注意力层的输入;
    从所述第一注意力层获取所述多组临床数据的临床特征比重向量,从所述第二注意力层获取所述多组检验知识的知识特征比重向量;
    将所述临床特征比重向量和所述知识特征比重向量输入所述预设相似度排序模型,得 到所述多组临床数据和所述多组检验知识对应的嵌入向量,其中,所述嵌入向量表示所述多组临床数据和所述多组检验知识的相似度;
    基于所述嵌入向量对所述预设相似度排序模型进行训练,得到所述训练好的所述目标相似度排序模型。
  6. 根据权利要求1所述的方法,其中,所述将所述待检数据输入到所述训练好的目标相似度排序模型,得到多个检验知识,包括:
    基于所述训练好的目标相似度排序模型,对所述待检数据进行计算得到相似度矩阵;
    按照从大到小的规则对所述相似度矩阵中包含的相似度进行排序,得到排序后的目标相似度矩阵;
    确定所述目标相似度矩阵的前k个相似度,得到所述k个相似度对应的k个检验知识,所述多个检验知识包括所述k个检验知识,其中,所述k个检验知识与所述待检数据对应,k为大于1的整数。
  7. 根据权利要求6所述的方法,其中,所述依据所述多个检验知识判断所述待检数据对应的临床检验结论是否处于异常状态,包括:
    获取所述待检数据对应的临床检验结论,判断所述临床检验结论是否包含所述k个检验知识;
    若所述临床检验结论包含所述k个检验知识,判断所述临床检验结论与所述k个检验知识是否一致;若一致,确定所述临床检验结论处于非异常状态;若不一致,确定所述临床检验结论处于异常状态;
    若所述临床检验结论不包含所述k个检验知识中任意一个检验知识,确定所述临床检验结论处于异常状态。
  8. 一种数据异常检测装置,其中,应用于服务器,所述装置包括:接收单元、训练单元和判断单元,其中,
    所述接收单元,用于接收临床数据集和检验知识集,依据所述临床数据集和检验知识集生成临床检验数据集;
    所述训练单元,用于将所述临床检验数据集作为预设相似度排序模型的训练数据执行训练操作,得到训练好的目标相似度排序模型;
    所述判断单元,用于接收待检数据,将所述待检数据输入到所述训练好的目标相似度排序模型,得到多个检验知识,依据所述多个检验知识判断所述待检数据对应的临床检验结论是否处于异常状态。
  9. 一种服务器,其中,包括处理器、输入设备、输出设备和存储器,所述处理器、输入设备、输出设备和存储器相互连接,其中,所述存储器用于存储计算机程序,所述计算机程序包括程序指令,所述处理器被配置用于调用所述程序指令,其中:
    接收临床数据集和检验知识集,依据所述临床数据集和检验知识集生成临床检验数据集;
    将所述临床检验数据集作为预设相似度排序模型的训练数据执行训练操作,得到训练 好的目标相似度排序模型;
    接收待检数据,将所述待检数据输入到所述训练好的目标相似度排序模型,得到多个检验知识,依据所述多个检验知识判断所述待检数据对应的临床检验结论是否处于异常状态。
  10. 根据权利要求9所述的服务器,其中,所述处理器用于:
    从所述临床数据集中获取多组临床数据,其中,所述多组临床数据中任意一组临床数据包括:至少一个临床症状和所述至少一个临床症状对应的临床检验结论;
    从所述检验知识集中获取多组检验知识,其中,所述多组检验知识中任意一组检验知识包括:一个病症以及所述一个病症对应的至少一个症状检验知识;
    依据所述多组临床数据与所述多组检验知识生成所述临床检验数据集。
  11. 根据权利要求10所述的服务器,其中,所述处理器用于:
    从所述多组临床数据中获取任意一组临床数据作为目标临床数据,计算所述目标临床数据与所述多组检验知识对应的多个杰卡德相似度;
    获取所述多个杰卡德相似度中的最大值对应的检验知识为目标检验知识,生成所述目标临床数据与所述目标检验知识的映射关系作为临床检验数据,重复上述步骤,得到所述多组临床数据对应的多个临床检验数据,合并所述多个临床检验数据得到所述临床检验数据集。
  12. 根据权利要求11所述的服务器,其中,所述处理器用于:
    获取预设杰卡德计算公式,获取所述多组检验知识中任意一组检验知识,将所述目标临床数据与所述任意一组检验知识作为所述预设杰卡德计算公式的输入,得到所述目标临床数据与所述任意一组检验知识的杰卡德相似度,重复上述步骤,得到多个杰卡德相似度。
  13. 根据权利要求10所述的服务器,其中,所述处理器用于:
    将所述多组临床数据作为所述预设相似度排序模型的第一注意力层的输入;
    将所述多组检验知识作为所述预设相似度排序模型的第二注意力层的输入;
    从所述第一注意力层获取所述多组临床数据的临床特征比重向量,从所述第二注意力层获取所述多组检验知识的知识特征比重向量;
    将所述临床特征比重向量和所述知识特征比重向量输入所述预设相似度排序模型,得到所述多组临床数据和所述多组检验知识对应的嵌入向量,其中,所述嵌入向量表示所述多组临床数据和所述多组检验知识的相似度;
    基于所述嵌入向量对所述预设相似度排序模型进行训练,得到所述训练好的所述目标相似度排序模型。
  14. 根据权利要求9所述的服务器,其中,所述处理器用于:
    基于所述训练好的目标相似度排序模型,对所述待检数据进行计算得到相似度矩阵;
    按照从大到小的规则对所述相似度矩阵中包含的相似度进行排序,得到排序后的目标相似度矩阵;
    确定所述目标相似度矩阵的前k个相似度,得到所述k个相似度对应的k个检验知识, 所述多个检验知识包括所述k个检验知识,其中,所述k个检验知识与所述待检数据对应,k为大于1的整数。
  15. 根据权利要求14所述的服务器,其中,所述处理器用于:
    获取所述待检数据对应的临床检验结论,判断所述临床检验结论是否包含所述k个检验知识;
    若所述临床检验结论包含所述k个检验知识,判断所述临床检验结论与所述k个检验知识是否一致;若一致,确定所述临床检验结论处于非异常状态;若不一致,确定所述临床检验结论处于异常状态;
    若所述临床检验结论不包含所述k个检验知识中任意一个检验知识,确定所述临床检验结论处于异常状态。
  16. 一种计算机可读存储介质,其中,所述计算机可读存储介质存储有计算机程序,所述计算机程序包括程序指令,所述程序指令当被处理器执行时使所述处理器执行以下步骤:
    接收临床数据集和检验知识集,依据所述临床数据集和检验知识集生成临床检验数据集;
    将所述临床检验数据集作为预设相似度排序模型的训练数据执行训练操作,得到训练好的目标相似度排序模型;
    接收待检数据,将所述待检数据输入到所述训练好的目标相似度排序模型,得到多个检验知识,依据所述多个检验知识判断所述待检数据对应的临床检验结论是否处于异常状态。
  17. 根据权利要16所述的计算机可读存储介质,其中,所述程序指令当被处理器执行时使所述处理器还执行以下步骤:
    从所述临床数据集中获取多组临床数据,其中,所述多组临床数据中任意一组临床数据包括:至少一个临床症状和所述至少一个临床症状对应的临床检验结论;
    从所述检验知识集中获取多组检验知识,其中,所述多组检验知识中任意一组检验知识包括:一个病症以及所述一个病症对应的至少一个症状检验知识;
    依据所述多组临床数据与所述多组检验知识生成所述临床检验数据集。
  18. 根据权利要17所述的计算机可读存储介质,其中,所述程序指令当被处理器执行时使所述处理器还执行以下步骤:
    从所述多组临床数据中获取任意一组临床数据作为目标临床数据,计算所述目标临床数据与所述多组检验知识对应的多个杰卡德相似度;
    获取所述多个杰卡德相似度中的最大值对应的检验知识为目标检验知识,生成所述目标临床数据与所述目标检验知识的映射关系作为临床检验数据,重复上述步骤,得到所述多组临床数据对应的多个临床检验数据,合并所述多个临床检验数据得到所述临床检验数据集。
  19. 根据权利要18所述的计算机可读存储介质,其中,所述程序指令当被处理器执行 时使所述处理器还执行以下步骤:
    获取预设杰卡德计算公式,获取所述多组检验知识中任意一组检验知识,将所述目标临床数据与所述任意一组检验知识作为所述预设杰卡德计算公式的输入,得到所述目标临床数据与所述任意一组检验知识的杰卡德相似度,重复上述步骤,得到多个杰卡德相似度。
  20. 根据权利要17所述的计算机可读存储介质,其中,所述程序指令当被处理器执行时使所述处理器还执行以下步骤:
    将所述多组临床数据作为所述预设相似度排序模型的第一注意力层的输入;
    将所述多组检验知识作为所述预设相似度排序模型的第二注意力层的输入;
    从所述第一注意力层获取所述多组临床数据的临床特征比重向量,从所述第二注意力层获取所述多组检验知识的知识特征比重向量;
    将所述临床特征比重向量和所述知识特征比重向量输入所述预设相似度排序模型,得到所述多组临床数据和所述多组检验知识对应的嵌入向量,其中,所述嵌入向量表示所述多组临床数据和所述多组检验知识的相似度;
    基于所述嵌入向量对所述预设相似度排序模型进行训练,得到所述训练好的所述目标相似度排序模型。
PCT/CN2020/118419 2020-06-28 2020-09-28 数据异常检测方法及装置 WO2021114831A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010598054.8A CN111755086A (zh) 2020-06-28 2020-06-28 数据异常检测方法及装置
CN202010598054.8 2020-06-28

Publications (1)

Publication Number Publication Date
WO2021114831A1 true WO2021114831A1 (zh) 2021-06-17

Family

ID=72677528

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/118419 WO2021114831A1 (zh) 2020-06-28 2020-09-28 数据异常检测方法及装置

Country Status (2)

Country Link
CN (1) CN111755086A (zh)
WO (1) WO2021114831A1 (zh)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113053478A (zh) * 2021-03-11 2021-06-29 上海交通大学医学院附属新华医院 一种适用于药物临床试验的项目及质量管控系统

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2004093822A2 (en) * 2003-03-24 2004-11-04 Nien Wei Methods for predicting an individual's clinical treatment outcome from sampling a group of patients' biological profiles
CN105260588A (zh) * 2015-10-23 2016-01-20 福建优安米信息科技有限公司 一种健康守护机器人系统及其数据处理方法
CN105787232A (zh) * 2014-12-17 2016-07-20 中国移动通信集团公司 一种数据处理方法、装置、健康系统平台及终端
CN110808095A (zh) * 2019-09-18 2020-02-18 平安科技(深圳)有限公司 诊断结果识别、模型训练的方法、计算机设备及存储介质

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2004093822A2 (en) * 2003-03-24 2004-11-04 Nien Wei Methods for predicting an individual's clinical treatment outcome from sampling a group of patients' biological profiles
CN105787232A (zh) * 2014-12-17 2016-07-20 中国移动通信集团公司 一种数据处理方法、装置、健康系统平台及终端
CN105260588A (zh) * 2015-10-23 2016-01-20 福建优安米信息科技有限公司 一种健康守护机器人系统及其数据处理方法
CN110808095A (zh) * 2019-09-18 2020-02-18 平安科技(深圳)有限公司 诊断结果识别、模型训练的方法、计算机设备及存储介质

Also Published As

Publication number Publication date
CN111755086A (zh) 2020-10-09

Similar Documents

Publication Publication Date Title
CN108492887B (zh) 医疗知识图谱构建方法及装置
Albahli Efficient GAN-based Chest Radiographs (CXR) augmentation to diagnose coronavirus disease pneumonia
Brumback et al. Sensitivity analyses for unmeasured confounding assuming a marginal structural model for repeated measures
Volz et al. Inferring the source of transmission with phylogenetic data
CN108091372B (zh) 医疗字段映射校验方法及装置
Osman et al. SOM-LWL method for identification of COVID-19 on chest X-rays
WO2021189955A1 (zh) 待检查项确定方法、装置、设备及计算机可读存储介质
CN112435753A (zh) 医疗数据的自动验证
McHugh et al. The effect of uncertainty in patient classification on diagnostic performance estimations
CN113808738A (zh) 一种基于自识别影像的疾病识别系统
WO2021159814A1 (zh) 文本数据的错误检测方法、装置、终端设备及存储介质
Efendi et al. A joint model for longitudinal continuous and time‐to‐event outcomes with direct marginal interpretation
Hamoodi et al. Reliability and validity of the Wrightington classification of elbow fracture-dislocation
WO2021114831A1 (zh) 数据异常检测方法及装置
CN106951710B (zh) 基于特权信息学习支持向量机的cap数据系统及方法
Chakravorti et al. Detection and classification of COVID 19 using convolutional neural network from chest X-ray images
Wang et al. Stratified proportional win‐fractions regression analysis
CN111177356B (zh) 一种酸碱指标医疗大数据分析方法及系统
Langohr et al. Estimation and residual analysis with R for a linear regression model with an interval‐censored covariate
Katar et al. Deep learning based covid-19 detection with a novel CT images dataset: EFSCH-19
US20230273973A1 (en) Method and device for homogenization conversion of same index detected by different equipment, and electronic equipment
Liao et al. A machine learning‐based risk scoring system for infertility considering different age groups
Raj et al. Detection of COVID-19 in chest X-ray image using convolutional neural network
WO2021103623A1 (zh) 一种脓毒血症的预警装置、设备及存储介质
Sanayei et al. The Challenge Dataset–simple evaluation for safe, transparent healthcare AI deployment

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20898241

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 21/02/2023)

122 Ep: pct application non-entry in european phase

Ref document number: 20898241

Country of ref document: EP

Kind code of ref document: A1