WO2023240837A1 - Service package generation method, apparatus and device based on patient data, and storage medium - Google Patents
Service package generation method, apparatus and device based on patient data, and storage medium Download PDFInfo
- Publication number
- WO2023240837A1 WO2023240837A1 PCT/CN2022/121728 CN2022121728W WO2023240837A1 WO 2023240837 A1 WO2023240837 A1 WO 2023240837A1 CN 2022121728 W CN2022121728 W CN 2022121728W WO 2023240837 A1 WO2023240837 A1 WO 2023240837A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- data
- medical
- treatment
- target
- treatment data
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims abstract description 62
- 238000011282 treatment Methods 0.000 claims abstract description 321
- 208000037265 diseases, disorders, signs and symptoms Diseases 0.000 claims abstract description 213
- 201000010099 disease Diseases 0.000 claims abstract description 208
- 239000013598 vector Substances 0.000 claims abstract description 160
- 238000011176 pooling Methods 0.000 claims abstract description 57
- 238000004458 analytical method Methods 0.000 claims abstract description 40
- 238000004422 calculation algorithm Methods 0.000 claims abstract description 37
- 238000012545 processing Methods 0.000 claims abstract description 31
- 238000007499 fusion processing Methods 0.000 claims abstract description 17
- 238000013503 de-identification Methods 0.000 claims abstract description 16
- 238000003745 diagnosis Methods 0.000 claims description 51
- 239000000284 extract Substances 0.000 claims description 35
- 238000007621 cluster analysis Methods 0.000 claims description 32
- 238000004364 calculation method Methods 0.000 claims description 16
- 230000009467 reduction Effects 0.000 claims description 16
- 238000000605 extraction Methods 0.000 claims description 13
- 238000004590 computer program Methods 0.000 claims description 9
- 230000004927 fusion Effects 0.000 claims description 8
- 238000012216 screening Methods 0.000 claims description 5
- 229940079593 drug Drugs 0.000 description 18
- 239000003814 drug Substances 0.000 description 18
- 230000008569 process Effects 0.000 description 14
- 206010006187 Breast cancer Diseases 0.000 description 10
- 208000026310 Breast neoplasm Diseases 0.000 description 10
- 238000010586 diagram Methods 0.000 description 9
- 230000010354 integration Effects 0.000 description 9
- 238000005516 engineering process Methods 0.000 description 6
- 230000007246 mechanism Effects 0.000 description 6
- 238000003672 processing method Methods 0.000 description 6
- 230000036541 health Effects 0.000 description 5
- 238000000513 principal component analysis Methods 0.000 description 5
- 238000001356 surgical procedure Methods 0.000 description 5
- 238000007726 management method Methods 0.000 description 4
- 230000002265 prevention Effects 0.000 description 4
- 238000013478 data encryption standard Methods 0.000 description 3
- 238000001914 filtration Methods 0.000 description 3
- 230000000474 nursing effect Effects 0.000 description 3
- 238000005070 sampling Methods 0.000 description 3
- 230000035945 sensitivity Effects 0.000 description 3
- 238000012795 verification Methods 0.000 description 3
- 206010061623 Adverse drug reaction Diseases 0.000 description 2
- 201000002862 Angle-Closure Glaucoma Diseases 0.000 description 2
- 208000030453 Drug-Related Side Effects and Adverse reaction Diseases 0.000 description 2
- 206010015958 Eye pain Diseases 0.000 description 2
- 206010015967 Eye swelling Diseases 0.000 description 2
- 206010034960 Photophobia Diseases 0.000 description 2
- 208000006117 ST-elevation myocardial infarction Diseases 0.000 description 2
- 206010000891 acute myocardial infarction Diseases 0.000 description 2
- 230000015572 biosynthetic process Effects 0.000 description 2
- 210000005252 bulbus oculi Anatomy 0.000 description 2
- 230000008859 change Effects 0.000 description 2
- 238000002512 chemotherapy Methods 0.000 description 2
- 229940044683 chemotherapy drug Drugs 0.000 description 2
- 238000013527 convolutional neural network Methods 0.000 description 2
- 230000009977 dual effect Effects 0.000 description 2
- 238000011156 evaluation Methods 0.000 description 2
- 230000004438 eyesight Effects 0.000 description 2
- 230000006870 function Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000003786 synthesis reaction Methods 0.000 description 2
- 238000011269 treatment regimen Methods 0.000 description 2
- 206010008479 Chest Pain Diseases 0.000 description 1
- 208000010412 Glaucoma Diseases 0.000 description 1
- WQZGKKKJIJFFOK-GASJEMHNSA-N Glucose Natural products OC[C@H]1OC(O)[C@H](O)[C@@H](O)[C@@H]1O WQZGKKKJIJFFOK-GASJEMHNSA-N 0.000 description 1
- 101500025419 Homo sapiens Epidermal growth factor Proteins 0.000 description 1
- 101001012157 Homo sapiens Receptor tyrosine-protein kinase erbB-2 Proteins 0.000 description 1
- 206010020751 Hypersensitivity Diseases 0.000 description 1
- 206010030348 Open-Angle Glaucoma Diseases 0.000 description 1
- 102100030086 Receptor tyrosine-protein kinase erbB-2 Human genes 0.000 description 1
- 208000006011 Stroke Diseases 0.000 description 1
- 230000003213 activating effect Effects 0.000 description 1
- 201000001326 acute closed-angle glaucoma Diseases 0.000 description 1
- 208000026935 allergic disease Diseases 0.000 description 1
- 230000007815 allergy Effects 0.000 description 1
- 238000013473 artificial intelligence Methods 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 201000005682 chronic closed-angle glaucoma Diseases 0.000 description 1
- 238000007405 data analysis Methods 0.000 description 1
- 238000011157 data evaluation Methods 0.000 description 1
- 238000007418 data mining Methods 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 206010012601 diabetes mellitus Diseases 0.000 description 1
- 230000006806 disease prevention Effects 0.000 description 1
- 210000001508 eye Anatomy 0.000 description 1
- 239000008103 glucose Substances 0.000 description 1
- 229940116978 human epidermal growth factor Drugs 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 230000010365 information processing Effects 0.000 description 1
- 208000014674 injury Diseases 0.000 description 1
- 238000007689 inspection Methods 0.000 description 1
- 206010023332 keratitis Diseases 0.000 description 1
- 238000005065 mining Methods 0.000 description 1
- 238000012544 monitoring process Methods 0.000 description 1
- 230000004379 myopia Effects 0.000 description 1
- 208000001491 myopia Diseases 0.000 description 1
- SQMWSBKSHWARHU-SDBHATRESA-N n6-cyclopentyladenosine Chemical compound O[C@@H]1[C@H](O)[C@@H](CO)O[C@H]1N1C2=NC=NC(NC3CCCC3)=C2N=C1 SQMWSBKSHWARHU-SDBHATRESA-N 0.000 description 1
- 238000003058 natural language processing Methods 0.000 description 1
- GVUGOAYIVIDWIO-UFWWTJHBSA-N nepidermin Chemical compound C([C@@H](C(=O)N[C@@H]([C@@H](C)CC)C(=O)NCC(=O)N[C@@H](CCC(O)=O)C(=O)N[C@@H](CCCNC(N)=N)C(=O)N[C@@H](CS)C(=O)N[C@@H](CCC(N)=O)C(=O)N[C@@H](CC=1C=CC(O)=CC=1)C(=O)N[C@@H](CCCNC(N)=N)C(=O)N[C@@H](CC(O)=O)C(=O)N[C@@H](CC(C)C)C(=O)N[C@@H](CCCCN)C(=O)N[C@@H](CC=1C2=CC=CC=C2NC=1)C(=O)N[C@@H](CC=1C2=CC=CC=C2NC=1)C(=O)N[C@@H](CCC(O)=O)C(=O)N[C@@H](CC(C)C)C(=O)N[C@@H](CCCNC(N)=N)C(O)=O)NC(=O)CNC(=O)[C@@H](NC(=O)[C@@H](NC(=O)[C@H](CS)NC(=O)[C@H](CC(N)=O)NC(=O)[C@H](CS)NC(=O)[C@H](C)NC(=O)[C@H](CC=1C=CC(O)=CC=1)NC(=O)[C@H](CCCCN)NC(=O)[C@H](CC(O)=O)NC(=O)[C@H](CC(C)C)NC(=O)[C@H](C)NC(=O)[C@H](CCC(O)=O)NC(=O)[C@@H](NC(=O)[C@H](CC=1C=CC(O)=CC=1)NC(=O)[C@H](CCSC)NC(=O)[C@H](CS)NC(=O)[C@@H](NC(=O)CNC(=O)[C@H](CC(O)=O)NC(=O)[C@H](CC=1NC=NC=1)NC(=O)[C@H](CC(C)C)NC(=O)[C@H](CS)NC(=O)[C@H](CC=1C=CC(O)=CC=1)NC(=O)CNC(=O)[C@H](CC(O)=O)NC(=O)[C@H](CC=1NC=NC=1)NC(=O)[C@H](CO)NC(=O)[C@H](CC(C)C)NC(=O)[C@H]1N(CCC1)C(=O)[C@H](CS)NC(=O)[C@H](CCC(O)=O)NC(=O)[C@H](CO)NC(=O)[C@H](CC(O)=O)NC(=O)[C@H](CO)NC(=O)[C@@H](N)CC(N)=O)C(C)C)[C@@H](C)CC)C(C)C)C(C)C)C1=CC=C(O)C=C1 GVUGOAYIVIDWIO-UFWWTJHBSA-N 0.000 description 1
- 238000003062 neural network model Methods 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000010827 pathological analysis Methods 0.000 description 1
- 230000002085 persistent effect Effects 0.000 description 1
- 238000000554 physical therapy Methods 0.000 description 1
- 238000013439 planning Methods 0.000 description 1
- 201000006366 primary open angle glaucoma Diseases 0.000 description 1
- 230000001737 promoting effect Effects 0.000 description 1
- 102000005962 receptors Human genes 0.000 description 1
- 108020003175 receptors Proteins 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 230000000717 retained effect Effects 0.000 description 1
- 238000012552 review Methods 0.000 description 1
- 238000000926 separation method Methods 0.000 description 1
- 238000011524 similarity measure Methods 0.000 description 1
- 230000003595 spectral effect Effects 0.000 description 1
- 238000001228 spectrum Methods 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 230000004083 survival effect Effects 0.000 description 1
- 230000008685 targeting Effects 0.000 description 1
- 230000001225 therapeutic effect Effects 0.000 description 1
- 210000000115 thoracic cavity Anatomy 0.000 description 1
- 229960000575 trastuzumab Drugs 0.000 description 1
- 230000008733 trauma Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16H—HEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
- G16H10/00—ICT specially adapted for the handling or processing of patient-related medical or healthcare data
- G16H10/60—ICT specially adapted for the handling or processing of patient-related medical or healthcare data for patient-specific data, e.g. for electronic patient records
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16H—HEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
- G16H50/00—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
- G16H50/70—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for mining of medical data, e.g. analysing previous cases of other patients
Definitions
- This application relates to the field of big data technology, and in particular to a method, device, equipment and storage medium for generating a service package based on patient data.
- the traditional medical treatment model does not allow patients to participate in their own health management, understand the causes of disease, and popularize knowledge about disease prevention, and there is no prevention and treatment services for disease-related complications.
- the main purpose of this application is to solve the technical problem in the existing technology of being unable to analyze original case data and obtain service packages corresponding to different types of diseases, and to improve the efficiency of medical services.
- the first aspect of this application provides a method for generating service packages based on patient data, including: collecting original case data of similar diseases from a preset medical information platform, and extracting disease types and treatment data in the original case data; The treatment data is de-identified to obtain target treatment data; multiple key events in the target treatment data are extracted, and the key events are fused to obtain medical information of the disease corresponding to the disease type.
- a second aspect of this application provides a device for generating a service package based on patient data, including a memory and at least one processor, instructions are stored in the memory, and the memory and the at least one processor are interconnected through lines; The at least one processor calls the instructions in the memory.
- the processor executes the computer-readable instructions, the following steps are implemented: collect original case data of similar diseases from the preset medical information platform, and extract all the original case data.
- a third aspect of the present application provides a computer-readable storage medium.
- a computer program is stored on the computer-readable storage medium. When the computer program is run on a computer, it causes the computer to perform the following steps: from preset medical information to The platform collects original case data of similar diseases and extracts disease types and treatment data in the original case data; de-identifies the treatment data to obtain target treatment data; extracts multiple target treatment data in the target treatment data. key events, and the key events are fused to obtain the medical information set of the disease corresponding to the disease type; the medical information set is input into the preset Bilstm model for vector calculation, and the medical information set of the disease is obtained.
- original case data of similar diseases are collected from a preset medical information platform, and the disease types and treatment data in the original case data are extracted; and the treatment data are de-identified.
- the medical feature vector of the medical data is obtained, and the medical feature vector is pooled and analyzed to obtain a medical feedforward vector; according to the medical feedforward vector, all the medical feature vectors are calculated based on the preset cosine similarity algorithm.
- Figure 3 is a schematic diagram of the third embodiment of the service package generation method based on patient data provided by this application;
- Figure 4 is a schematic diagram of the fourth embodiment of the service package generation method based on patient data provided by this application;
- Figure 5 is a schematic diagram of the fifth embodiment of the service package generation method based on patient data provided by this application;
- Figure 6 is a schematic diagram of a first embodiment of a service package generation device based on patient data provided by this application;
- Figure 7 is a schematic diagram of a second embodiment of a service package generation device based on patient data provided by this application.
- Figure 8 is a schematic diagram of an embodiment of a service package generation device based on patient data provided by this application.
- the method, device, equipment and storage medium for generating service packages based on patient data first collect original case data of similar diseases from a preset medical information platform, and extract the diseases in the original case data. type and treatment data; de-identify the treatment data to obtain target treatment data; extract multiple key events in the target treatment data, and fuse the key events to obtain the disease type A collection of medical information corresponding to the disease; input the target treatment data into the preset Bilstm model to obtain the medical feature vector of the medical data, and perform pooling analysis on the medical feature vector to obtain a medical feedforward vector; according to The medical feedforward vector performs cluster analysis on the medical information set based on a preset cosine similarity algorithm to obtain a medical service package corresponding to the disease.
- This application performs cluster analysis on original case data to obtain target treatment data with common characteristics, and generates service packages corresponding to different types of diseases based on the target treatment data, solving the problem of the lack of sustainable tracking services in current medical services and the difficulty in achieving timely
- the lack of effective feedback and effective integration of medical data information leads to the problem that the quality of mobile medical services cannot be guaranteed. Effectively improve the efficiency and portability of medical services, reduce waiting time for medical personnel, and relieve medical pressure.
- the first embodiment of the method for generating a service package based on patient data in the embodiment of the present application includes:
- the product When designing the business system, the product will add the data features required by the business to the interface:
- Diagnosis and treatment card information patient name, gender, age, birthday, household registration address, residential address, ID type, ID number, mobile phone number, social security card number, ethnicity, nationality, guardian information (name, phone number, ID card, etc.), marriage status, etc.
- Patient condition information affected area, related living habits, condition description, incidence pattern, allergy history, medication history, illness time, condition grade, etc.
- Doctor s prescribing information: department, diagnosis results, prescription category, drug name, drug specifications, drug brand, drug usage
- de-identification refers to a data processing method that processes identifiers so that the processed information cannot identify a specific personal information subject.
- de-identification the biggest difference between China's "Personal Information Security Specifications” and “Personal Information De-Identification Guidelines” and the definition of de-identification in relevant laws in the United States, Canada and other regions is whether the possibility of indirect identification needs to be considered to prevent re-identification.
- the possibility of re-identification needs to be comprehensively evaluated in conjunction with other additional information that may be available.
- de-identification refers to the process of removing the association between a set of identifiable data and the data subject. Through this process, data managers can delete or change the identification information in the data set, making it difficult or impossible for attackers to use the data set to identify specific individual subjects, so that the data set can be shared for use within a predetermined range.
- De-identification is one of the main tools of privacy preserving data publishing (PPDP) [1]. By removing the association between privacy attributes and data subjects in the data set, and having sufficient ability to prevent re-identification, Certain attributes of the data set can be shared and published for processing and analysis by external business systems.
- key events may refer to core events that are set according to different diseases of the target object in the treatment data and can represent the diagnosis and treatment record or the corresponding disease in the diagnosis and treatment plan.
- the disease of the target object in the treatment data can be Breast cancer is a breast cancer disease. Since the treatment strategies for breast cancer diseases at different clinical stages are completely different, the treatment options for breast cancer diseases at the same clinical stage will also be different based on different molecular classifications and pathological diagnosis. For example, targeting HER2 (human epidermal growth factor) For breast cancer patients who are positive for breast cancer (factor receptor), the timing of administration of the targeted drug trastuzumab and the judgment of its therapeutic efficacy are particularly critical.
- the key events in the treatment data can be the first diagnosis and treatment event, or local treatment-related events, Of course, it can also be drug treatment-related events, efficacy evaluation events, adverse drug reaction events, etc., and this example embodiment is not limited to this.
- data fusion technology refers to information processing technology that uses computers to automatically analyze and synthesize certain observation information obtained in time series under certain criteria to complete the required decision-making and evaluation tasks.
- Data fusion technology includes the collection, transmission, synthesis, filtering, correlation and synthesis of useful information from various information sources to assist people in situation/environment determination, planning, detection, verification, and diagnosis.
- the target treatment data is converted from text form to vector form for final calculation of the similarity between vectors, and the medical word that best matches the sentence to be analyzed is determined from multiple medical words based on the similarity;
- the Bilstm model is a natural language processing neural network model.
- the Bilstm model converts target treatment data into vectors.
- the sentence to be analyzed is "eye swelling, eye pain, photophobia, hard eyeball, weak vision”
- the first medical word is "glaucoma, acute angle-closure glaucoma, chronic angle-closure glaucoma, primary open-angle glaucoma, filter “Overbubble separation”
- the second medical word "myopia”
- the third medical word “keratitis”, etc.
- the pooling analysis is based on the pooling proposed by the convolutional neural network.
- maxpooling and avg pooling are used to process the target treatment data respectively, and a combination of maxpooling and maxpooling are used.
- mean pooling (avg pooling) perform pooling operations in parallel dual pooling layers to retain deeper semantic information of the target treatment data.
- Cosine similarity measures the similarity between two vectors by measuring the cosine of their angle.
- the cosine of an angle of 0 degrees is 1, while the cosine of any other angle is no greater than 1; and its minimum value is -1.
- the cosine of the angle between two vectors thus determines whether the two vectors point roughly in the same direction.
- the cosine similarity value is 1; when the angle between the two vectors is 90°, the cosine similarity value is 0; when the two vectors point in completely opposite directions, the cosine similarity value is -1. This result has nothing to do with the length of the vector, only the direction in which the vector points.
- Cosine similarity is usually used in positive spaces and therefore gives values between -1 and 1.
- the Sanfu moxibustion service package includes 5 services of Sanfu moxibustion, which is specially designed for the elderly to treat winter diseases in summer and prevent them in advance.
- the cardiology department inpatient secretary service package is specifically for patients with cardiology diseases. It includes three services: cardiology department offline diagnosis green pass, cardiology department offline examination green pass, and cardiology inpatient green pass.
- Each service package contains one or more classification labels.
- the classification labels are selected from various common phenomena, such as children, teenagers, youth, middle-aged, and elderly according to age groups. Classified by department: ophthalmology, cardiology, thoracic surgery, etc. The applicable groups are divided into: students, office workers, pregnant women, people with high income, etc.
- original case data of similar diseases are collected from a preset medical information platform, and the disease type and treatment data in the original case data are extracted; the treatment data is de-identified to obtain the target Treatment data; extract multiple key events in the target treatment data, and perform fusion processing on the key events to obtain a medical information set of diseases corresponding to the disease type; input the target treatment data into the preset Bilstm
- the medical feature vector of the medical data is obtained, and the medical feature vector is pooled and analyzed to obtain a medical feedforward vector; according to the medical feedforward vector, the medical feature vector is calculated based on a preset cosine similarity algorithm.
- the information collection is subjected to cluster analysis to obtain the medical service package corresponding to the disease.
- This application performs cluster analysis on original case data to obtain target treatment data with common characteristics, and generates service packages corresponding to different types of diseases based on the target treatment data, solving the problem of the lack of sustainable tracking services in current medical services and the difficulty in achieving timely
- the lack of effective feedback and effective integration of medical data information leads to the problem that the quality of mobile medical services cannot be guaranteed. Effectively improve the efficiency and portability of medical services, reduce waiting time for medical personnel, and relieve medical pressure.
- diagnosis and treatment templates are configured for different types of patients. Different types of patients specifically refer to patients with different reasons for seeking medical treatment. When patients seek medical treatment for different reasons (specific reasons include: chest pain, stroke, and trauma), they generally have corresponding diagnosis and treatment procedures for examination, diagnosis, and treatment. Therefore, a diagnosis and treatment template can be constructed based on the corresponding diagnosis and treatment procedures for subsequent entry. Chronological verification of information.
- the information entry template Before obtaining treatment data, configure an information entry template. Users can enter information according to the prompts of the pre-configured information entry template, collect patient information, reduce errors and omissions in information entry, and reduce the frequency of subsequent modifications.
- the information entry template includes sub-templates for each diagnosis and treatment item for each type of patient.
- the time information of each treatment data is obtained, specifically: the input time of each treatment data entered by the patient as the corresponding time information is obtained.
- the entry time of the treatment data can be used as the time information corresponding to the treatment data to realize automatic entry of time information and reduce workload.
- the user can also be prompted to check the automatically entered time information and modify it if it is wrong.
- the diagnosis and treatment template also includes duration indicators for each diagnosis and treatment item; calculates the actual duration of each diagnosis and treatment item based on the time information of each treatment data of the current patient; and determines whether the actual duration meets the corresponding requirements. Duration indicator, if so, report various treatment data, otherwise a warning will be given.
- the medical feedforward vector perform cluster analysis on the medical information collection based on the preset cosine similarity algorithm, and obtain the medical service package corresponding to the disease.
- the third embodiment of the service package generation method based on patient data in the embodiment of this application includes:
- a treatment data query database is constructed, and a differential privacy algorithm and an encryption algorithm are used to combine the patient's treatment data.
- the private data is de-identified to update the treatment data query database, so that a clinical image query and diagnosis and treatment system can be established based on the updated treatment data query database.
- the embodiment of this application not only meets the confidentiality requirements in the differential privacy protection model, but also ensures that the database The reliability of the data released in the system can help clinical researchers query and collect past cases, big data analysis and evaluation, and lay a good foundation for promoting the automation of medical data statistics, eliminating information islands, and providing decision support.
- the probability Pr[ ⁇ ] represents the risk of privacy leakage, which is controlled by the randomness of algorithm A(D);
- ⁇ is the privacy protection budget parameter, which is used to adjust and balance data privacy security and data reliability.
- random noise is added using different noise mechanisms to the raw treatment data for sensitivity fields of different data types.
- the Laplacian mechanism is used to add random noise to the original treatment data of numeric type sensitivity fields
- the exponential mechanism is used to add random noise to the original treatment data of non-numeric type sensitivity fields.
- the Laplacian mechanism processes numerical data (continuous data), such as patient age, and adds random noise to the numerical results to achieve differential privacy.
- the exponential mechanism processes non-numeric (discrete data) data, and does not return deterministic results, but returns results with a certain probability value.
- the output is a set of discrete data, which can be determined by the scoring function. The output with a high score has a high probability, and the score Low output probability is low.
- Laplacian noise is added to numerically sensitive attributes such as patient age and examination date in the table
- exponential noise is added to attributes such as gender, education level, region, examination equipment type, and disease in the data table to obtain the noise results. Replace it in the data table.
- the medical feedforward vector perform cluster analysis on the medical information collection based on the preset cosine similarity algorithm, and obtain the medical service package corresponding to the disease.
- Steps 301, 306-308 in this embodiment are similar to steps 101, 103-105 in the first embodiment, and will not be described again here.
- original case data of similar diseases are collected from the preset medical information platform, and the disease type and treatment data in the original case data are extracted; the treatment data is de-identified to obtain the target treatment data; extraction Target multiple key events in the treatment data and fuse the key events to obtain a medical information set of diseases corresponding to the disease type; input the target treatment data into the preset Bilstm model to obtain the medical feature vector of the medical data , and perform pooling analysis on the medical feature vectors to obtain medical feedforward vectors; according to the medical feedforward vectors, perform cluster analysis on the medical information collection based on the preset cosine similarity algorithm, and obtain the medical service package corresponding to the disease.
- the fourth embodiment of the service package generation method based on patient data in the embodiment of this application includes:
- the second key event may refer to an event determined by extracting key events from multiple treatment data as a whole.
- the second key event may be all treatment data of the corresponding series of the target object.
- the first diagnosis and treatment event in can also be an adverse drug reaction event in all treatment data.
- the second key event can also be an event determined by joint judgment on treatment data recorded across multiple visits, such as for the first time after relapse.
- chemotherapy events first determine the first visit for the first recurrence, then extract chemotherapy drugs from the drug orders of all visits after this visit, and finally find the first visit where chemotherapy drugs appear, then the first chemotherapy event after recurrence can be determined.
- the treatment data of the target subject are jointly constituted in chronological order according to the first key event extracted from the sorted single treatment data and the second key event extracted from the sorted plurality of treatment data. corresponding key events.
- Attribute characteristics can refer to different attributes corresponding to key events.
- attribute characteristics can be the time attribute corresponding to key events, or the recurrence type corresponding to key events.
- attribute characteristics can also be other attributes corresponding to key events. In this example The embodiment does not specifically limit this.
- the medical feedforward vector perform cluster analysis on the medical information collection based on the preset cosine similarity algorithm, and obtain the medical service package corresponding to the disease.
- This application performs cluster analysis on original case data to obtain target treatment data with common characteristics, and generates service packages corresponding to different types of diseases based on the target treatment data, solving the problem of the lack of sustainable tracking services in current medical services and the difficulty in achieving timely
- the lack of effective feedback and effective integration of medical data information leads to the problem that the quality of mobile medical services cannot be guaranteed. Effectively improve the efficiency and portability of medical services, reduce waiting time for medical personnel, and relieve medical pressure.
- the medical feature vector is input into two different parallel pooling layers for dimensionality reduction after passing through convolutional layers with different window sizes and feature extraction of filtering units below them - avg pooling layer and max pooling layer.
- avg pooling layer and max pooling layer fully combines the dynamic extraction characteristics of the max pooling layer and the contribution ability of the avg pooling layer to the average semantics of short texts, effectively reducing the loss of semantic information during dimensionality reduction; finally, the necessary semantic splicing is performed in the concatenation layer to form Original feedforward vector and medical feedforward vector; among them, the max pooling layer dynamic extraction method and the avg pooling layer average semantic method for short text take into account the impact of the height of the convolution kernel sliding window on the generated feature map.
- the height of the convolution kernel is used as an important basis for the number M of down-sampling of the feature map.
- Frequency compliance requirements can be that the frequency of the target medical features appearing in the historical analysis model meets certain threshold conditions, or the frequency of the target medical features appearing in the historical analysis model is sorted, and the target medical features that meet certain requirements are ranked as frequency compliance requirements. target medical characteristics.
- multidimensional data may refer to all data stored in the database, and may include data newly added every time the data is changed as well as historical data before the change.
- the initial data refers to the user data.
- the medical data generated after medical treatment and stored under the user's name can include historical medical data and current medical data. Specifically, it can include but is not limited to consultation location, consultation time, International Classification of Diseases (ICD), registration Department, registered doctor information, registration fee, payment method, examination items, examination fee, condition description, medical treatment suggestions, drug list, drug price, drug dosage, payment window, drug collection window, whether to return for follow-up consultation, time for follow-up consultation, number of consultations, etc. data.
- ICD International Classification of Diseases
- the server can extract initial data from multi-dimensional data based on the selected target medical features.
- the extracted initial data can be divided into multiple categories. For example, for medical insurance data, it can include but is not limited to this medical expense data, this time Medical ICD data, historical medical data.
- the cost data of this medical treatment can include but is not limited to surgery fees, drug fees, examination fees, etc.
- the ICD data of this medical treatment can include but is not limited to the cost of this confirmed ICD, the average cost of the ICD, etc.
- historical medical data can Including but not limited to data such as the number of local outpatient clinics, the number of local hospitalizations, the number of outpatient clinics in other places, the number of hospitalizations in other places, the proportion of local outpatient visits, the proportion of outpatient visits in other places, etc.
- the data magnitude of the extracted initial data may be greatly different.
- the drug fee is 500
- the total cost is 1,000,000.
- the server can perform data processing on initial data of different data levels through data processing methods of the same data level to obtain standard data of the same data level. For example, following the previous example, the drug fee and total cost are processed with the same data magnitude, and the drug fee and total cost with data magnitude between 0 and 100 are obtained, that is, the standard drug fee obtained is 0.05, and the standard total cost is is 100.
- data processing methods of the same data level can be selected based on different data types or different data levels. For example, methods such as square root, square, cube, exponential, logarithm, etc. can be selected, and this application does not limit this. .
- the preset dimensions may be dimensions preset by the user on the server through the terminal according to subsequent data processing requirements.
- the data volume of the target data in the preset dimensions may be smaller than the data volume of the standard data.
- Nonlinear dimensionality reduction processing methods can include but are not limited to IsometricFeatureMapping (Isomap), Locally Linear Embedding (LLE), Modified Locally Linear Embedding (MLLE), Hessian Eigenmapping, Spectrum Embedding (Spectral Embedding), Local Tangent SpaceAlignment (LTSA), Multi-dimensional Scaling (MDS), t-distributedStochastic Neighbor Embedding (t-SNE) wait.
- IsometricFeatureMapping Isomap
- LLE Locally Linear Embedding
- MLLE Modified Locally Linear Embedding
- Hessian Eigenmapping Spectrum Embedding (Spectral Embedding), Local Tangent SpaceAlignment (LTSA), Multi-dimensional Scal
- linear dimensionality reduction processing methods can also be used, which can include but are not limited to Principal Component Analysis (PCA), kernel PCA, and incremental principal component analysis (Incremental). PCA) etc.
- PCA Principal Component Analysis
- kernel PCA kernel PCA
- Intelligent incremental principal component analysis
- PCA incremental principal component analysis
- the server can use the clustering characteristics of the data in the multi-dimensional Riemannian space according to the above method to map the multi-dimensional standard data to a low dimension, for example, to 2 dimensions, to obtain the target data.
- the target medical features are obtained through the historical analysis model, and then the initial data corresponding to the target medical features in the multidimensional data are extracted, and the standard data is obtained after processing the data of the same data magnitude, and the standard numbers are Nonlinear dimensionality reduction processing is used to obtain target data with preset dimensions.
- the generated target data is generated based on multi-dimensional data and is related to the multi-dimensional data, so that the characteristics of the multi-dimensional data can be maintained, and subsequent data processing and analysis can be performed through the target data.
- the pooling analysis is based on the pooling proposed by the convolutional neural network.
- maximum pooling maxpooling
- average pooling avg pooling
- maximum pooling and mean pooling (avg pooling) perform pooling operations in parallel dual pooling layers to retain deeper semantic information of medical feature vectors.
- the medical feedforward vector perform cluster analysis on the medical information collection based on the preset cosine similarity algorithm, and obtain the medical service package corresponding to the disease.
- Steps 501-503 and 510 in this embodiment are similar to steps 101-103 and 105 in the first embodiment, and will not be described again here.
- original case data of similar diseases are collected from the preset medical information platform, and the disease type and treatment data in the original case data are extracted; the treatment data is de-identified to obtain the target treatment data; extraction Multiple key events in the target treatment data, and fuse the key events to obtain a medical information collection of diseases corresponding to the disease type; input the target treatment data into the preset Bilstm model to obtain the medical feature vector of the medical data, and Pooling analysis is performed on the medical feature vectors to obtain medical feedforward vectors; according to the medical feedforward vectors, cluster analysis is performed on the medical information collection based on the preset cosine similarity algorithm to obtain a medical service package corresponding to the disease.
- This application performs cluster analysis on original case data to obtain target treatment data with common characteristics, and generates service packages corresponding to different types of diseases based on the target treatment data, solving the problem of the lack of sustainable tracking services in current medical services and the difficulty in achieving timely
- the lack of effective feedback and effective integration of medical data information leads to the problem that the quality of mobile medical services cannot be guaranteed. Effectively improve the efficiency and portability of medical services, reduce waiting time for medical personnel, and relieve medical pressure.
- a first embodiment of a data service package generating device includes:
- the collection module 601 is used to collect original case data of similar diseases from the preset medical information platform, and extract the disease type and treatment data in the original case data;
- De-identification module 602 is used to de-identify the treatment data to obtain target treatment data
- the fusion module 603 is used to extract multiple key events in the target treatment data and perform fusion processing on the key events to obtain a medical information set of diseases corresponding to the disease type;
- Pooling module 604 is used to input the medical information set into the preset Bilstm model for vector calculation to obtain the medical feature vector of the disease, and perform pooling analysis on the medical feature vector to obtain a medical feedforward vector;
- This application performs cluster analysis on original case data to obtain target treatment data with common characteristics, and generates service packages corresponding to different types of diseases based on the target treatment data, solving the problem of the lack of sustainable tracking services in current medical services and the difficulty in achieving timely
- the lack of effective feedback and effective integration of medical data information leads to the problem that the quality of mobile medical services cannot be guaranteed. Effectively improve the efficiency and portability of medical services, reduce waiting time for medical personnel, and relieve medical pressure.
- FIG. 7 is a second embodiment of a service package generation device based on patient data in the embodiment of this application.
- the service package generation device based on patient data specifically includes:
- De-identification module 602 is used to de-identify the treatment data to obtain target treatment data
- the fusion module 603 is used to extract multiple key events in the target treatment data and perform fusion processing on the key events to obtain a medical information set of diseases corresponding to the disease type;
- Pooling module 604 is used to input the medical information set into the preset Bilstm model for vector calculation to obtain the medical feature vector of the disease, and perform pooling analysis on the medical feature vector to obtain a medical feedforward vector;
- the de-identification module 602 is specifically used to:
- the identifier field is encrypted to obtain target treatment data.
- the fusion module 603 is specifically used to:
- the effective medical data is extracted according to the key event set to obtain multiple key events in the target treatment data.
- the score data of the key event is determined based on the attribute characteristics and the weight value, and a medical information set of diseases corresponding to the disease type is obtained based on the score data.
- the pooling module 604 includes:
- the feature extraction unit 6041 is used to extract features from the medical feature vector to obtain the target medical features of the medical feature vector;
- the dimensionality reduction unit 6042 is used to perform dimensionality reduction processing on the target medical features to obtain target data of preset dimensions;
- the pooling unit 6043 is used to perform pooling processing on the target data to obtain the medical feedforward vector of the target treatment data.
- original case data of similar diseases are collected from a preset medical information platform, and the disease type and treatment data in the original case data are extracted; the treatment data is de-identified to obtain the target Treatment data; extract multiple key events in the target treatment data, and perform fusion processing on the key events to obtain a medical information set of diseases corresponding to the disease type; input the target treatment data into the preset Bilstm
- the medical feature vector of the medical data is obtained, and the medical feature vector is pooled and analyzed to obtain a medical feedforward vector; according to the medical feedforward vector, the medical feature vector is calculated based on a preset cosine similarity algorithm.
- the information collection is subjected to cluster analysis to obtain the medical service package corresponding to the disease.
- FIG. 8 is a schematic structural diagram of a service package generation device based on patient data provided by an embodiment of the present application.
- the service package generation device 800 based on patient data may vary greatly due to different configurations or performance, and may include one or One or more central processing units (CPU) 810 (e.g., one or more processors) and memory 820, one or more storage media 830 (e.g., one or more mass storage devices) storing applications 833 or data 832 ).
- the memory 820 and the storage medium 830 may be short-term storage or persistent storage.
- the program stored in the storage medium 830 may include one or more modules (not shown in the figure), and each module may include a series of instruction operations in the service package generation device 800 based on patient data.
- the computer-readable storage medium can be a non-volatile computer-readable storage medium.
- the computer-readable storage medium can also be a volatile computer-readable storage medium. Instructions are stored in the computer-readable storage medium. When the instructions are run on the computer, the computer is caused to execute the steps of the above-mentioned service package generation method based on patient data.
- the integrated unit is implemented in the form of a software functional unit and sold or used as an independent product, it may be stored in a computer-readable storage medium.
- the technical solution of the present application is essentially or contributes to the existing technology, or all or part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a storage medium , including several instructions to cause a computer device (which may be a personal computer, a server, or a network device, etc.) to execute all or part of the steps of the methods described in various embodiments of this application.
- the aforementioned storage media include: U disk, mobile hard disk, read-only memory (ROM), random access memory (RAM), magnetic disk or optical disk and other media that can store program code. .
Landscapes
- Engineering & Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Medical Informatics (AREA)
- Public Health (AREA)
- Epidemiology (AREA)
- General Health & Medical Sciences (AREA)
- Primary Health Care (AREA)
- Data Mining & Analysis (AREA)
- Biomedical Technology (AREA)
- Databases & Information Systems (AREA)
- Pathology (AREA)
- Medical Treatment And Welfare Office Work (AREA)
Abstract
The present application relates to the technical field of big data. Disclosed are a service package generation method, apparatus and device based on patient data, and a storage medium. The method comprises: performing de-identification processing on collected original case data, so as to obtain target treatment data of a patient; extracting a plurality of key events from the target treatment data, and performing fusion processing on the key events, so as to obtain a medical information set of a disease corresponding to a disease type; inputting the target treatment data into a preset Bilstm model, so as to obtain a medical feature vector of medical data, and performing pooling analysis on the medical feature vector, so as to obtain a medical feedforward vector; and according to the medical feedforward vector, performing clustering analysis on the medical information set on the basis of a preset cosine similarity algorithm, so as to obtain a medical service package corresponding to the disease. By means of the present application, original case data is clustered to obtain service packages corresponding to different types of diseases with common features, thereby solving the technical problem of low medical service efficiency, and relieving the medical pressure.
Description
本申请要求于2022年06月15日提交中国专利局、申请号为202210671458.4,发明名称为“基于病患数据的服务包生成方法、装置、设备及存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application requires the priority of the Chinese patent application submitted to the China Patent Office on June 15, 2022, with the application number 202210671458.4, and the invention name is "Service package generation method, device, equipment and storage medium based on patient data", which The entire contents are incorporated herein by reference.
本申请涉及大数据技术领域,尤其涉及一种基于病患数据的服务包生成方法、装置、设备及存储介质。This application relates to the field of big data technology, and in particular to a method, device, equipment and storage medium for generating a service package based on patient data.
目前我国的医疗现状基本是专病专治,即用户发现自己生病了,去医院看病,医生诊断后,给患者做出开药、复查等治疗方案,患者根据治疗方案配合治疗。患者康复之后就认为本次看病结束。发明人意识到这种方式存在以下缺点:The current medical situation in our country is basically disease-specific treatment, that is, users find themselves sick and go to the hospital to see a doctor. After the doctor makes a diagnosis, he or she will prescribe medication, review and other treatment plans for the patient, and the patient will cooperate with the treatment according to the treatment plan. Once the patient recovers, the visit is considered over. The inventor realized that this method has the following shortcomings:
1、传统的看病模式没有后续让患者参与到自己的健康管理中来,去了解发病原因,认识疾病预防相关知识的科普,更没有疾病相关并发症的预防和治疗服务。1. The traditional medical treatment model does not allow patients to participate in their own health management, understand the causes of disease, and popularize knowledge about disease prevention, and there is no prevention and treatment services for disease-related complications.
2、传统的看病模式是一个被动式看病模式。没有提前预防的机制。没有让患者健康环节做提升。2. The traditional medical treatment model is a passive medical treatment model. There is no mechanism for early prevention. There is no improvement in patient health.
3、传统看病模式基本上是医生单病单治疗,没有并发病,潜在病的综合防治和治疗。3. The traditional medical treatment model is basically that the doctor treats a single disease without complications, and provides comprehensive prevention and treatment of potential diseases.
目前只有少量富人才有自己专门的家庭医生,大部分普通人没有健康方面全方位的管理能力。怎样让大部分人都拥有无病时提前预防,生病时得到全方位治疗,健康方面得到全方位管理是一个普遍的社会性问题。因此,如何基于病患数据为患者提供便利医疗,提高患者的就诊效率成了本领域技术人员需要解决的技术问题。At present, only a small number of rich people have their own dedicated family doctors, and most ordinary people do not have comprehensive health management capabilities. How to allow most people to have early prevention when they are not sick, receive comprehensive treatment when they are sick, and get comprehensive management of health is a common social problem. Therefore, how to provide patients with convenient medical treatment based on patient data and improve the efficiency of patient treatment has become a technical problem that needs to be solved by those skilled in the art.
发明内容Contents of the invention
本申请的主要目的在于解决现有技术中无法对原始病例数据分析,进而得到不同类型疾病对应服务包的技术问题,提高了医疗服务效率。The main purpose of this application is to solve the technical problem in the existing technology of being unable to analyze original case data and obtain service packages corresponding to different types of diseases, and to improve the efficiency of medical services.
本申请第一方面提供了基于病患数据的服务包生成方法,包括:从预设医疗信息平台中采集同类疾病的原始病例数据,并提取所述原始病例数据中的疾病类型和治疗数据;对所述治疗数据进行去标识化处理,得到目标治疗数据;提取所述目标治疗数据中的多个关键事件,并将所述关键事件进行融合处理,得到与所述疾病类型对应的疾病的医疗信息集合;将所述医疗信息集合输入预设Bilstm模型中进行向量计算,得到所述疾病的医学特征向量,并对所述医学特征向量进行池化分析,得到医学前馈向量;根据所述医学前馈向量,基于预设余弦相似度算法对所述医疗信息集合进行聚类分析,得到与所述疾病对应的医疗服务包。The first aspect of this application provides a method for generating service packages based on patient data, including: collecting original case data of similar diseases from a preset medical information platform, and extracting disease types and treatment data in the original case data; The treatment data is de-identified to obtain target treatment data; multiple key events in the target treatment data are extracted, and the key events are fused to obtain medical information of the disease corresponding to the disease type. Set; input the medical information set into the preset Bilstm model for vector calculation to obtain the medical feature vector of the disease, and perform pooling analysis on the medical feature vector to obtain a medical feedforward vector; according to the medical feedforward vector Feed vector, perform cluster analysis on the medical information collection based on a preset cosine similarity algorithm, and obtain a medical service package corresponding to the disease.
本申请第二方面提供了一种基于病患数据的服务包生成设备,包括存储器和至少一个处理器,所述存储器中存储有指令,所述存储器和所述至少一个处理器通过线路互连;所述至少一个处理器调用所述存储器中的所述指令,所述处理器执行所述计算机可读指令时实现如下步骤:从预设医疗信息平台中采集同类疾病的原始病例数据,并提取所述原始病例数据中的疾病类型和治疗数据;对所述治疗数据进行去标识化处理,得到目标治疗数据;提取所述目标治疗数据中的多个关键事件,并将所述关键事件进行融合处理,得到与所述疾病类型对应的疾病的医疗信息集合;将所述医疗信息集合输入预设Bilstm模型中进行向量计算,得到所述疾病的医学特征向量,并对所述医学特征向量进行池化分析,得到医学前馈向量;根据所述医学前馈向量,基于预设余弦相似度算法对所述医疗信息集合进行聚类分析,得到与所述疾病对应的医疗服务包。A second aspect of this application provides a device for generating a service package based on patient data, including a memory and at least one processor, instructions are stored in the memory, and the memory and the at least one processor are interconnected through lines; The at least one processor calls the instructions in the memory. When the processor executes the computer-readable instructions, the following steps are implemented: collect original case data of similar diseases from the preset medical information platform, and extract all the original case data. Describe the disease type and treatment data in the original case data; de-identify the treatment data to obtain target treatment data; extract multiple key events in the target treatment data, and perform fusion processing on the key events , obtain the medical information set of the disease corresponding to the disease type; input the medical information set into the preset Bilstm model for vector calculation, obtain the medical feature vector of the disease, and pool the medical feature vector Analyze to obtain a medical feedforward vector; perform cluster analysis on the medical information set based on the preset cosine similarity algorithm according to the medical feedforward vector to obtain a medical service package corresponding to the disease.
本申请第三方面提供了一种计算机可读存储介质,所述计算机可读存储介质上存储有计算机程序,当所述计算机程序在计算机上运行时,使得计算机执行如下步骤:从预设医疗信息平 台中采集同类疾病的原始病例数据,并提取所述原始病例数据中的疾病类型和治疗数据;对所述治疗数据进行去标识化处理,得到目标治疗数据;提取所述目标治疗数据中的多个关键事件,并将所述关键事件进行融合处理,得到与所述疾病类型对应的疾病的医疗信息集合;将所述医疗信息集合输入预设Bilstm模型中进行向量计算,得到所述疾病的医学特征向量,并对所述医学特征向量进行池化分析,得到医学前馈向量;根据所述医学前馈向量,基于预设余弦相似度算法对所述医疗信息集合进行聚类分析,得到与所述疾病对应的医疗服务包。A third aspect of the present application provides a computer-readable storage medium. A computer program is stored on the computer-readable storage medium. When the computer program is run on a computer, it causes the computer to perform the following steps: from preset medical information to The platform collects original case data of similar diseases and extracts disease types and treatment data in the original case data; de-identifies the treatment data to obtain target treatment data; extracts multiple target treatment data in the target treatment data. key events, and the key events are fused to obtain the medical information set of the disease corresponding to the disease type; the medical information set is input into the preset Bilstm model for vector calculation, and the medical information set of the disease is obtained. feature vectors, and pooling analysis is performed on the medical feature vectors to obtain medical feedforward vectors; according to the medical feedforward vectors, clustering analysis is performed on the medical information collection based on a preset cosine similarity algorithm to obtain the Medical service package corresponding to the above-mentioned diseases.
本申请第四方面提供了一种基于病患数据的服务包生成装置,包括:采集模块,用于从预设医疗信息平台中采集同类疾病的原始病例数据,并提取所述原始病例数据中的疾病类型和治疗数据;去标识模块,用于对所述治疗数据进行去标识化处理,得到目标治疗数据;融合模块,用于提取所述目标治疗数据中的多个关键事件,并将所述关键事件进行融合处理,得到与所述疾病类型对应的疾病的医疗信息集合;池化模块,用于将所述医疗信息集合输入预设Bilstm模型中进行向量计算,得到所述疾病的医学特征向量,并对所述医学特征向量进行池化分析,得到医学前馈向量;聚类模块,用于根据所述医学前馈向量,基于预设余弦相似度算法对所述医疗信息集合进行聚类分析,得到与所述疾病对应的医疗服务包。The fourth aspect of this application provides a device for generating service packages based on patient data, including: a collection module for collecting original case data of similar diseases from a preset medical information platform, and extracting the original case data from the original case data. Disease type and treatment data; a de-identification module, used to de-identify the treatment data to obtain target treatment data; a fusion module, used to extract multiple key events in the target treatment data, and combine the The key events are fused to obtain the medical information set of the disease corresponding to the disease type; the pooling module is used to input the medical information set into the preset Bilstm model for vector calculation to obtain the medical feature vector of the disease. and perform pooling analysis on the medical feature vector to obtain a medical feedforward vector; a clustering module is used to perform cluster analysis on the medical information collection based on the preset cosine similarity algorithm according to the medical feedforward vector. , get the medical service package corresponding to the disease.
本申请提供的技术方案中,通过从预设医疗信息平台中采集同类疾病的原始病例数据,并提取所述原始病例数据中的疾病类型和治疗数据;对所述治疗数据进行去标识化处理,得到目标治疗数据;提取所述目标治疗数据中的多个关键事件,并将所述关键事件进行融合处理,得到与所述疾病类型对应的疾病的医疗信息集合;将所述目标治疗数据输入预设Bilstm模型中,得到所述医疗数据的医学特征向量,并对所述医学特征向量进行池化分析,得到医学前馈向量;根据所述医学前馈向量,基于预设余弦相似度算法对所述医疗信息集合进行聚类分析,得到与所述疾病对应的医疗服务包。本申请通过对原始病例数据进行聚类分析得到具有共性特征的目标治疗数据,并根据目标治疗数据生成不同类型疾病对应服务包,解决了目前医疗服务缺少可持续化的跟踪服务,难以做到及时有效的反馈,对医疗数据信息缺乏有效集成,导致移动医疗服务质量无法保证的问题。有效提升医疗服务的效率和便携性,减少就医人员等待时间,缓解了医疗压力。In the technical solution provided by this application, original case data of similar diseases are collected from a preset medical information platform, and the disease types and treatment data in the original case data are extracted; and the treatment data are de-identified. Obtain target treatment data; extract multiple key events in the target treatment data, and perform fusion processing on the key events to obtain a medical information set of diseases corresponding to the disease type; input the target treatment data into a pre-set Assume that in the Bilstm model, the medical feature vector of the medical data is obtained, and the medical feature vector is pooled and analyzed to obtain a medical feedforward vector; according to the medical feedforward vector, all the medical feature vectors are calculated based on the preset cosine similarity algorithm. Perform cluster analysis on the medical information collection to obtain a medical service package corresponding to the disease. This application performs cluster analysis on original case data to obtain target treatment data with common characteristics, and generates service packages corresponding to different types of diseases based on the target treatment data, solving the problem of the lack of sustainable tracking services in current medical services and the difficulty in achieving timely The lack of effective feedback and effective integration of medical data information leads to the problem that the quality of mobile medical services cannot be guaranteed. Effectively improve the efficiency and portability of medical services, reduce waiting time for medical personnel, and relieve medical pressure.
图1为本申请提供的基于病患数据的服务包生成方法的第一个实施例示意图;Figure 1 is a schematic diagram of the first embodiment of the service package generation method based on patient data provided by this application;
图2为本申请提供的基于病患数据的服务包生成方法的第二个实施例示意图;Figure 2 is a schematic diagram of the second embodiment of the service package generation method based on patient data provided by this application;
图3为本申请提供的基于病患数据的服务包生成方法的第三个实施例示意图;Figure 3 is a schematic diagram of the third embodiment of the service package generation method based on patient data provided by this application;
图4为本申请提供的基于病患数据的服务包生成方法的第四个实施例示意图;Figure 4 is a schematic diagram of the fourth embodiment of the service package generation method based on patient data provided by this application;
图5为本申请提供的基于病患数据的服务包生成方法的第五个实施例示意图;Figure 5 is a schematic diagram of the fifth embodiment of the service package generation method based on patient data provided by this application;
图6为本申请提供的基于病患数据的服务包生成装置的第一个实施例示意图;Figure 6 is a schematic diagram of a first embodiment of a service package generation device based on patient data provided by this application;
图7为本申请提供的基于病患数据的服务包生成装置的第二个实施例示意图;Figure 7 is a schematic diagram of a second embodiment of a service package generation device based on patient data provided by this application;
图8为本申请提供的基于病患数据的服务包生成设备的一个实施例示意图。Figure 8 is a schematic diagram of an embodiment of a service package generation device based on patient data provided by this application.
本申请实施例提供的基于病患数据的服务包生成方法、装置、设备及存储介质,先通过从预设医疗信息平台中采集同类疾病的原始病例数据,并提取所述原始病例数据中的疾病类型和治疗数据;对所述治疗数据进行去标识化处理,得到目标治疗数据;提取所述目标治疗数据中的多个关键事件,并将所述关键事件进行融合处理,得到与所述疾病类型对应的疾病的医疗信息集合;将所述目标治疗数据输入预设Bilstm模型中,得到所述医疗数据的医学特征向量,并对所述医学特征向量进行池化分析,得到医学前馈向量;根据所述医学前馈向量,基于预设余弦相似度算法对所述医疗信息集合进行聚类分析,得到与所述疾病对应的医疗服务包。本申请通过对原始病例数据进行聚类分析得到具有共性特征的目标治疗数据,并根据目标治疗数据生成不同类型疾病对应服务包,解决了目前医疗服务缺少可持续化的跟踪服务,难以做到及时有效的反馈,对医疗数据信息缺乏有效集成,导致移动医疗服务质量无法保证的问题。有效提升 医疗服务的效率和便携性,减少就医人员等待时间,缓解了医疗压力。The method, device, equipment and storage medium for generating service packages based on patient data provided by the embodiments of this application first collect original case data of similar diseases from a preset medical information platform, and extract the diseases in the original case data. type and treatment data; de-identify the treatment data to obtain target treatment data; extract multiple key events in the target treatment data, and fuse the key events to obtain the disease type A collection of medical information corresponding to the disease; input the target treatment data into the preset Bilstm model to obtain the medical feature vector of the medical data, and perform pooling analysis on the medical feature vector to obtain a medical feedforward vector; according to The medical feedforward vector performs cluster analysis on the medical information set based on a preset cosine similarity algorithm to obtain a medical service package corresponding to the disease. This application performs cluster analysis on original case data to obtain target treatment data with common characteristics, and generates service packages corresponding to different types of diseases based on the target treatment data, solving the problem of the lack of sustainable tracking services in current medical services and the difficulty in achieving timely The lack of effective feedback and effective integration of medical data information leads to the problem that the quality of mobile medical services cannot be guaranteed. Effectively improve the efficiency and portability of medical services, reduce waiting time for medical personnel, and relieve medical pressure.
本申请的说明书和权利要求书及上述附中的术语“第一”、“第二”、“第三”、“第四”等(如果存在)是用于区别类似的对象,而不必用于描述特定的顺序或先后次序。应该理解这样使用的数据在适当情况下可以互换,以便这里描述的实施例能够以除了在这里图示或描述的内容以外的顺序实施。此外,术语“包括”或“具有”及其任何变形,意图在于覆盖不排他的包含,例如,包含了一系列步骤或单元的过程、方法、系统、产品或设备不必限于清楚地列出的那些步骤或单元,而是可包括没有清楚地列出的或对于这些过程、方法、产品或设备固有的其它步骤或单元。The terms "first", "second", "third", "fourth", etc. (if present) in the description and claims of this application and the above-mentioned appendix are used to distinguish similar objects and are not necessarily used to describe A specific order or sequence. It is to be understood that the data so used are interchangeable under appropriate circumstances so that the embodiments described herein can be practiced in sequences other than those illustrated or described herein. In addition, the terms "comprising" or "having" and any variations thereof are intended to cover non-exclusive inclusions, e.g., processes, methods, systems, products, or devices that comprise a series of steps or units and are not necessarily limited to those expressly listed. steps or units, but may include other steps or units not expressly listed or inherent to such processes, methods, products or apparatuses.
为便于理解,下面对本申请实施例的具体流程进行描述,请参阅图1,本申请实施例中基于病患数据的服务包生成方法的第一个实施例包括:For ease of understanding, the specific process of the embodiment of the present application is described below. Please refer to Figure 1. The first embodiment of the method for generating a service package based on patient data in the embodiment of the present application includes:
101、从预设医疗信息平台中采集同类疾病的原始病例数据,并提取原始病例数据中的疾病类型和治疗数据;101. Collect original case data of similar diseases from the preset medical information platform, and extract disease types and treatment data from the original case data;
本实施例中,患者到平安健康自建共建的互联网医院看病,会形成大量的数据,其中包含患者大数据、诊疗大数据、疾病大数据等。In this embodiment, when patients go to the Internet hospital built by Ping An Health to see a doctor, a large amount of data will be generated, including patient big data, diagnosis and treatment big data, disease big data, etc.
产品在设计业务系统时会在界面上加上本业务所需要的数据特征项:When designing the business system, the product will add the data features required by the business to the interface:
比如患者问诊时先要填写诊疗卡和病历单信息然后问诊,医生根据患者情况开处方,或者护理,每一项都会在页面上填写相关信息,这些信息就是可供体检归纳的特征信息:For example, when a patient consults, he must first fill in the diagnosis and treatment card and medical record information and then conduct the consultation. The doctor will prescribe or provide care according to the patient's condition. Each item will fill in relevant information on the page. This information is the characteristic information that can be summarized by the physical examination:
a.诊疗卡信息:患者姓名、性别、年龄、生日、户籍住址、居住地址、证件类型、证件号码、手机号、社保卡号、民族、国籍、监护人信息(姓名、电话、身份证等)、婚姻状况等。a. Diagnosis and treatment card information: patient name, gender, age, birthday, household registration address, residential address, ID type, ID number, mobile phone number, social security card number, ethnicity, nationality, guardian information (name, phone number, ID card, etc.), marriage status, etc.
b.患者病情信息:患病部位、相关生活习惯、病情描述、发病规律、过敏史、用药史、患病时间、病情等级等。b. Patient condition information: affected area, related living habits, condition description, incidence pattern, allergy history, medication history, illness time, condition grade, etc.
c.问诊诊断信息:患病名称、疑似病名称、病情严重层度、问诊建议等c. Consultation and diagnosis information: name of disease, name of suspected disease, severity of illness, consultation suggestions, etc.
d.医生开方信息:科室、诊断结果、处方类别、药品名称、药品规格、药品品牌、药品用法d. Doctor’s prescribing information: department, diagnosis results, prescription category, drug name, drug specifications, drug brand, drug usage
e.护理信息:疾病名称、病情信息、护理项(艾灸、推拿、理疗等等)、护理次数等。e. Nursing information: disease name, condition information, nursing items (moxibustion, massage, physiotherapy, etc.), number of nursing sessions, etc.
102、对治疗数据进行去标识化处理,得到目标治疗数据;102. De-identify the treatment data to obtain target treatment data;
本实施例中,去标识化是指一种对标识符进行处理,使其处理后的信息无法识别到特定个人信息主体的数据处理方式。其中中国的《个人信息安全规范》和《个人信息去标识化指南》与美国和加拿大等地区相关法律关于去标识化的定义最大的区别在于防止重识别是否需要考虑间接识别的可能性。中国限定了重识别时“不借助额外信息”,即否定了“间接识别”的情形,这一点与GDPR中的假名化非常类似;而CCPA和HIPAA等法律对防止重识别提出了要求更高,需要考虑到结合其他额外可能获得的信息综合评估重识别的可能性。In this embodiment, de-identification refers to a data processing method that processes identifiers so that the processed information cannot identify a specific personal information subject. Among them, the biggest difference between China's "Personal Information Security Specifications" and "Personal Information De-Identification Guidelines" and the definition of de-identification in relevant laws in the United States, Canada and other regions is whether the possibility of indirect identification needs to be considered to prevent re-identification. China limits re-identification to "without the use of additional information", which denies the situation of "indirect identification", which is very similar to pseudonymization in GDPR; laws such as CCPA and HIPAA have higher requirements for preventing re-identification. The possibility of re-identification needs to be comprehensively evaluated in conjunction with other additional information that may be available.
具体地,去标识化,就是指去除一组可识别数据和数据主体之间关联关系的过程。通过这个过程,数据管理者可以删除或改变数据集中的标识信息,使得攻击者很难或不能利用数据集识别出具体的个人主体身份,从而可以将数据集共享到预定范围内使用。去标识化是隐私保护数据发布(privacy preserving data publishing,PPDP)[1]的主要工具之一,通过去除数据集中隐私属性和数据主体之间的关联关系,并且具有足够的防止重识别能力后,数据集的某些属性就可以共享发布,供外部业务系统进行处理分析。Specifically, de-identification refers to the process of removing the association between a set of identifiable data and the data subject. Through this process, data managers can delete or change the identification information in the data set, making it difficult or impossible for attackers to use the data set to identify specific individual subjects, so that the data set can be shared for use within a predetermined range. De-identification is one of the main tools of privacy preserving data publishing (PPDP) [1]. By removing the association between privacy attributes and data subjects in the data set, and having sufficient ability to prevent re-identification, Certain attributes of the data set can be shared and published for processing and analysis by external business systems.
103、提取目标治疗数据中的多个关键事件,并将关键事件进行融合处理,得到与疾病类型对应的疾病的医疗信息集合;103. Extract multiple key events in the target treatment data, and fuse the key events to obtain a collection of medical information for the disease corresponding to the disease type;
本实施例中,关键事件可以是指依据治疗数据中目标对象不同的疾病而设定的、能够表征诊治记录或者诊治方案中该疾病对应的核心事件,例如,假设治疗数据中目标对象的疾病可以是乳腺癌疾病,由于不同临床分期的乳腺癌疾病对应的治疗策略完全不一样,相同临床分期地乳腺癌疾病根据不同分子分型及病理诊断治疗选择也会不一样,例如针对HER2(人类表皮生长因子受体)阳性的乳腺癌患者,靶向药物曲妥珠单抗用药时机及其治疗疗效判断显得尤为关键,因此需要设定乳腺癌疾病的关键事件,方便完成科研研究,有助于回顾分析乳腺癌疾病患者不同治疗选择的生存分析,进一步优化乳腺癌疾病患者的治疗策略;具体的,结合乳腺癌疾病特点, 治疗数据中的关键事件可以是首次诊疗事件,也可以是局部治疗相关事件,当然,还可以是药物治疗相关事件、疗效评价事件、药物不良反应事件等,本示例实施例不以此为限。In this embodiment, key events may refer to core events that are set according to different diseases of the target object in the treatment data and can represent the diagnosis and treatment record or the corresponding disease in the diagnosis and treatment plan. For example, it is assumed that the disease of the target object in the treatment data can be Breast cancer is a breast cancer disease. Since the treatment strategies for breast cancer diseases at different clinical stages are completely different, the treatment options for breast cancer diseases at the same clinical stage will also be different based on different molecular classifications and pathological diagnosis. For example, targeting HER2 (human epidermal growth factor) For breast cancer patients who are positive for breast cancer (factor receptor), the timing of administration of the targeted drug trastuzumab and the judgment of its therapeutic efficacy are particularly critical. Therefore, it is necessary to set key events of breast cancer disease to facilitate the completion of scientific research and facilitate retrospective analysis. Survival analysis of different treatment options for breast cancer patients to further optimize treatment strategies for breast cancer patients; specifically, combined with the characteristics of breast cancer disease, the key events in the treatment data can be the first diagnosis and treatment event, or local treatment-related events, Of course, it can also be drug treatment-related events, efficacy evaluation events, adverse drug reaction events, etc., and this example embodiment is not limited to this.
本实施例中,数据融合技术是指利用计算机对按时序获得的若干观测信息,在一定准则下加以自动分析、综合,以完成所需的决策和评估任务而进行的信息处理技术。数据融合技术,包括对各种信息源给出的有用信息的采集、传输、综合、过滤、相关及合成,以便辅助人们进行态势/环境判定、规划、探测、验证、诊断。In this embodiment, data fusion technology refers to information processing technology that uses computers to automatically analyze and synthesize certain observation information obtained in time series under certain criteria to complete the required decision-making and evaluation tasks. Data fusion technology includes the collection, transmission, synthesis, filtering, correlation and synthesis of useful information from various information sources to assist people in situation/environment determination, planning, detection, verification, and diagnosis.
104、将医疗信息集合输入预设Bilstm模型中进行向量计算,得到疾病的医学特征向量,并对医学特征向量进行池化分析,得到医学前馈向量;104. Input the medical information set into the preset Bilstm model for vector calculation to obtain the medical feature vector of the disease, and perform pooling analysis on the medical feature vector to obtain the medical feedforward vector;
本实施例中,将目标治疗数据从文字形式转换为向量形式,以用于最后计算向量之间的相似度,并根据相似度从多个医学词中确定出最匹配待分析语句的医学词;Bilstm模型为一种自然语言处理神经网络模型,bilstm模型将目标治疗数据进行向量转换。例如,待分析语句“眼胀、眼痛、畏光、眼球硬、视力弱”;第一医学词“青光眼、急性闭角型青光眼、慢性闭角型青光眼、原发性开角型青光眼、滤过泡分离术”,第二医学词“近视眼”,第三医学词“角膜炎”等。In this embodiment, the target treatment data is converted from text form to vector form for final calculation of the similarity between vectors, and the medical word that best matches the sentence to be analyzed is determined from multiple medical words based on the similarity; The Bilstm model is a natural language processing neural network model. The Bilstm model converts target treatment data into vectors. For example, the sentence to be analyzed is "eye swelling, eye pain, photophobia, hard eyeball, weak vision"; the first medical word is "glaucoma, acute angle-closure glaucoma, chronic angle-closure glaucoma, primary open-angle glaucoma, filter "Overbubble separation", the second medical word "myopia", the third medical word "keratitis", etc.
其中,Bilstm模型拼接目标治疗数据,得到原始语句向量HL,即如原始语句向量HL{眼胀+眼痛+畏光+眼球硬+视力弱}等。Among them, the Bilstm model splices the target treatment data to obtain the original sentence vector HL, that is, the original sentence vector HL {eye swelling + eye pain + photophobia + hard eyeball + weak vision}, etc.
池化分析为基于卷积神经网络提出的池化,在本实施例中运用极大池化(maxpooling)和均值池化(avg pooling)分别对目标治疗数据进行处理,采用结合极大池化(max pooling)和均值池化(avg pooling)并行双池化层的方式进行池化操作,从而保留目标治疗数据更深层次的语义信息。The pooling analysis is based on the pooling proposed by the convolutional neural network. In this embodiment, maxpooling and avg pooling are used to process the target treatment data respectively, and a combination of maxpooling and maxpooling are used. ) and mean pooling (avg pooling) perform pooling operations in parallel dual pooling layers to retain deeper semantic information of the target treatment data.
105、根据医学前馈向量,基于预设余弦相似度算法对医疗信息集合进行聚类分析,得到与疾病对应的服务。105. According to the medical feedforward vector, perform cluster analysis on the medical information collection based on the preset cosine similarity algorithm to obtain services corresponding to the disease.
本实施例中,余弦相似度,又称为余弦相似性,是通过计算两个向量的夹角余弦值来评估他们的相似度。余弦相似度将向量根据坐标值,绘制到向量空间中,如最常见的二维空间。In this embodiment, cosine similarity, also known as cosine similarity, evaluates the similarity between two vectors by calculating the cosine value of the angle between them. Cosine similarity draws vectors into vector space according to coordinate values, such as the most common two-dimensional space.
余弦相似性通过测量两个向量的夹角的余弦值来度量它们之间的相似性。0度角的余弦值是1,而其他任何角度的余弦值都不大于1;并且其最小值是-1。从而两个向量之间的角度的余弦值确定两个向量是否大致指向相同的方向。两个向量有相同的指向时,余弦相似度的值为1;两个向量夹角为90°时,余弦相似度的值为0;两个向量指向完全相反的方向时,余弦相似度的值为-1。这结果是与向量的长度无关的,仅仅与向量的指向方向相关。余弦相似度通常用于正空间,因此给出的值为-1到1之间。Cosine similarity measures the similarity between two vectors by measuring the cosine of their angle. The cosine of an angle of 0 degrees is 1, while the cosine of any other angle is no greater than 1; and its minimum value is -1. The cosine of the angle between two vectors thus determines whether the two vectors point roughly in the same direction. When two vectors point in the same direction, the cosine similarity value is 1; when the angle between the two vectors is 90°, the cosine similarity value is 0; when the two vectors point in completely opposite directions, the cosine similarity value is -1. This result has nothing to do with the length of the vector, only the direction in which the vector points. Cosine similarity is usually used in positive spaces and therefore gives values between -1 and 1.
注意这上下界对任何维度的向量空间中都适用,而且余弦相似性最常用于高维正空间。例如在信息检索中,每个词项被赋予不同的维度,而一个维度由一个向量表示,其各个维度上的值对应于该词项在文档中出现的频率。余弦相似度因此可以给出两篇文档在其主题方面的相似度。另外,它通常用于文本挖掘中的文件比较。此外,在数据挖掘领域中,会用到它来度量集群内部的凝聚力。Note that these upper and lower bounds apply to vector spaces of any dimension, and cosine similarity is most commonly used in high-dimensional positive spaces. For example, in information retrieval, each term is assigned a different dimension, and a dimension is represented by a vector, whose values in each dimension correspond to the frequency of occurrence of the term in the document. Cosine similarity thus gives how similar two documents are with respect to their subject matter. Additionally, it is often used for file comparison in text mining. In addition, in the field of data mining, it is used to measure the cohesion within the cluster.
其中,聚类分析指将物理或抽象对象的集合分组为由类似的对象组成的多个类的分析过程。目标就是在相似的基础上收集数据来分类。聚类源于很多领域,包括数学,计算机科学,统计学,生物学和经济学。在不同的应用领域,很多聚类技术都得到了发展,这些技术方法被用作描述数据,衡量不同数据源间的相似性,以及把数据源分类到不同的簇中。Among them, cluster analysis refers to the analysis process of grouping a collection of physical or abstract objects into multiple classes composed of similar objects. The goal is to collect data to classify based on similarity. Clustering originates from many fields, including mathematics, computer science, statistics, biology, and economics. In different application fields, many clustering techniques have been developed. These technical methods are used to describe data, measure the similarity between different data sources, and classify data sources into different clusters.
本实施例中,服务包也叫做卡商品。具体地,根据得到的目标治疗数据,先创建基本的服务,由这些服务组合成服务包。其中,一个服务包里包含一种或多种服务,如病程管理、三伏灸、慢阻肺、心电监护、儿童眼科检查、快速问诊、住院无忧等。In this embodiment, the service package is also called a card product. Specifically, based on the obtained target treatment data, basic services are first created, and these services are combined into service packages. Among them, a service package includes one or more services, such as disease course management, Sanfu moxibustion, COPD, ECG monitoring, children's eye examination, quick consultation, worry-free hospitalization, etc.
一个服务包的多种服务往往是关联性较大的一个或一组服务的组合。比如三伏灸服务包下包含三伏灸5次服务,是专门针对老年群体做冬病夏治提前预防用的。再比如:心内科住院小秘书服务包是专门针对心内科疾病患者的,下面包含心内科线下面诊绿通、心内科线下检查绿通、心内科住院绿通三个服务。Multiple services in a service package are often a combination of one or a group of services that are closely related. For example, the Sanfu moxibustion service package includes 5 services of Sanfu moxibustion, which is specially designed for the elderly to treat winter diseases in summer and prevent them in advance. Another example: the cardiology department inpatient secretary service package is specifically for patients with cardiology diseases. It includes three services: cardiology department offline diagnosis green pass, cardiology department offline examination green pass, and cardiology inpatient green pass.
服务包分类方面:每个服务包包含一个或多个分类标签,分类标签是从各种普遍现象中挑 选的,如从年龄段区分为儿童、少年、青年、中年、老年。从科室上分:眼科、心内科、胸外科等。从适用人群分为:学生、上班族、孕妇、三高人群等等。Service package classification: Each service package contains one or more classification labels. The classification labels are selected from various common phenomena, such as children, teenagers, youth, middle-aged, and elderly according to age groups. Classified by department: ophthalmology, cardiology, thoracic surgery, etc. The applicable groups are divided into: students, office workers, pregnant women, people with high income, etc.
本申请实施例中,通过从预设医疗信息平台中采集同类疾病的原始病例数据,并提取所述原始病例数据中的疾病类型和治疗数据;对所述治疗数据进行去标识化处理,得到目标治疗数据;提取所述目标治疗数据中的多个关键事件,并将所述关键事件进行融合处理,得到与所述疾病类型对应的疾病的医疗信息集合;将所述目标治疗数据输入预设Bilstm模型中,得到所述医疗数据的医学特征向量,并对所述医学特征向量进行池化分析,得到医学前馈向量;根据所述医学前馈向量,基于预设余弦相似度算法对所述医疗信息集合进行聚类分析,得到与所述疾病对应的医疗服务包。本申请通过对原始病例数据进行聚类分析得到具有共性特征的目标治疗数据,并根据目标治疗数据生成不同类型疾病对应服务包,解决了目前医疗服务缺少可持续化的跟踪服务,难以做到及时有效的反馈,对医疗数据信息缺乏有效集成,导致移动医疗服务质量无法保证的问题。有效提升医疗服务的效率和便携性,减少就医人员等待时间,缓解了医疗压力。In the embodiment of the present application, original case data of similar diseases are collected from a preset medical information platform, and the disease type and treatment data in the original case data are extracted; the treatment data is de-identified to obtain the target Treatment data; extract multiple key events in the target treatment data, and perform fusion processing on the key events to obtain a medical information set of diseases corresponding to the disease type; input the target treatment data into the preset Bilstm In the model, the medical feature vector of the medical data is obtained, and the medical feature vector is pooled and analyzed to obtain a medical feedforward vector; according to the medical feedforward vector, the medical feature vector is calculated based on a preset cosine similarity algorithm. The information collection is subjected to cluster analysis to obtain the medical service package corresponding to the disease. This application performs cluster analysis on original case data to obtain target treatment data with common characteristics, and generates service packages corresponding to different types of diseases based on the target treatment data, solving the problem of the lack of sustainable tracking services in current medical services and the difficulty in achieving timely The lack of effective feedback and effective integration of medical data information leads to the problem that the quality of mobile medical services cannot be guaranteed. Effectively improve the efficiency and portability of medical services, reduce waiting time for medical personnel, and relieve medical pressure.
请参阅图2,本申请实施例中基于病患数据的服务包生成方法的第二个实施例包括:Please refer to Figure 2. The second embodiment of the service package generation method based on patient data in the embodiment of this application includes:
201、从预设医疗信息平台中采集同类疾病的原始病例数据,并提取原始病例数据中的疾病类型和治疗数据;201. Collect original case data of similar diseases from the preset medical information platform, and extract disease types and treatment data in the original case data;
202、预先设置不同疾病类型的诊疗模板;202. Preset diagnosis and treatment templates for different disease types;
本实施例中,在从预设医疗信息平台中采集同类疾病的原始病例数据,并提取所述原始病例数据中的疾病类型和治疗数据之前,先针对不同类型患者配置不同的诊疗模板。不同类型的患者具体是指不同就诊原因的患者。患者因不同就诊原因(就诊原因具体包括:胸痛、卒中以及创伤)时,一般都有相应的诊疗流程,进行检查、确诊、治疗,因此可以根据相应的诊疗流程构建诊疗模板,以便用于后续录入信息的时间顺序核验。In this embodiment, before collecting original case data of similar diseases from the preset medical information platform and extracting disease types and treatment data in the original case data, different diagnosis and treatment templates are configured for different types of patients. Different types of patients specifically refer to patients with different reasons for seeking medical treatment. When patients seek medical treatment for different reasons (specific reasons include: chest pain, stroke, and trauma), they generally have corresponding diagnosis and treatment procedures for examination, diagnosis, and treatment. Therefore, a diagnosis and treatment template can be constructed based on the corresponding diagnosis and treatment procedures for subsequent entry. Chronological verification of information.
203、获取治疗数据对应的时间信息;203. Obtain the time information corresponding to the treatment data;
本实施例中,配置信息录入模板,通过所述信息录入模板收集用户录入的病患的各项治疗数据。In this embodiment, an information entry template is configured, and various treatment data of patients entered by the user are collected through the information entry template.
在获取治疗数据之前,配置信息录入模板,用户根据预先配置的信息录入模板的提示进行信息录入,收集病患信息,减少信息录入的错漏情况,降低后续修改的频率。具体的,信息录入模板包括针对每一类型患者每一诊疗项目的子模板。Before obtaining treatment data, configure an information entry template. Users can enter information according to the prompts of the pre-configured information entry template, collect patient information, reduce errors and omissions in information entry, and reduce the frequency of subsequent modifications. Specifically, the information entry template includes sub-templates for each diagnosis and treatment item for each type of patient.
优选的,获取各项治疗数据的时间信息,具体为:获取用户录入病患的各项治疗数据的录入时间作为对应的时间信息。一般来说,病患在进行各项诊疗项目时,会同步进行各项治疗数据的录入,因此可以以治疗数据的录入时间作为对应治疗数据的时间信息,实现时间信息的自动录入,减少工作量。同时,自动录入时间信息后,还可以提示用户对自动录入的时间信息进行检查,如果有误,则对其进行修改。Preferably, the time information of each treatment data is obtained, specifically: the input time of each treatment data entered by the patient as the corresponding time information is obtained. Generally speaking, when patients undergo various diagnosis and treatment projects, various treatment data will be entered simultaneously. Therefore, the entry time of the treatment data can be used as the time information corresponding to the treatment data to realize automatic entry of time information and reduce workload. . At the same time, after the time information is automatically entered, the user can also be prompted to check the automatically entered time information and modify it if it is wrong.
204、根据疾病类型,获取与治疗数据相应的诊疗模板;204. According to the disease type, obtain the diagnosis and treatment template corresponding to the treatment data;
本实施例中,结合病患的疾病类型及其顺序要求生成所述诊疗模板。具体的,本实施例设置了STEMI类型的患者对应的诊疗模板,该诊疗模板包括发病、院内首份心电图采集、院内首份心电图诊断、开始知情同意、启动导管室、激活导管室、患者到达导管室、决定介入手术、签署知情同意、导丝通过共十个诊疗项目,并规定了三个顺序要求,三个顺序要求分别为:开始知情同意时间应晚于启动导管室时间、启动导管室时间应晚于决定介入手术时间、签署知情同意时间应晚于开始知情同意时间。根据该诊疗模板对相应类型患者的治疗数据进行核验。In this embodiment, the diagnosis and treatment template is generated based on the patient's disease type and its sequence requirements. Specifically, this embodiment sets a diagnosis and treatment template corresponding to STEMI type patients. The diagnosis and treatment template includes onset, first ECG collection in the hospital, first ECG diagnosis in the hospital, starting informed consent, starting the catheterization laboratory, activating the catheterization laboratory, and the patient arriving at the catheter There are ten diagnosis and treatment items in the laboratory, deciding on interventional surgery, signing informed consent, and passing the guide wire, and stipulates three sequence requirements. The three sequence requirements are: the time to start informed consent should be later than the time to start the cath lab, and the time to start the cath lab. The time to decide on interventional surgery should be later than the time to sign informed consent, and the time to sign informed consent should be later than the time to start informed consent. Verify the treatment data of corresponding types of patients based on this diagnosis and treatment template.
205、根据诊疗模板对治疗数据进行核验,判断治疗数据的时间顺序是否正确;205. Verify the treatment data according to the diagnosis and treatment template to determine whether the time sequence of the treatment data is correct;
本实施例中,所述诊疗模板还包括各项诊疗项目的时长指标;根据所述当前病患各项治疗数据的时间信息,计算各项诊疗项目的实际时长;判断所述实际时长是否满足相应的时长指标,如果是,则对各项治疗数据进行上报,否则进行警告提示。In this embodiment, the diagnosis and treatment template also includes duration indicators for each diagnosis and treatment item; calculates the actual duration of each diagnosis and treatment item based on the time information of each treatment data of the current patient; and determines whether the actual duration meets the corresponding requirements. Duration indicator, if so, report various treatment data, otherwise a warning will be given.
除了对各诊疗项目的时间顺序要求进行核验之外,本实施例还对各诊疗项目的时长要求进行了核验。在设置STEMI类型的患者诊疗模板时,设置各诊疗项目的时长指标,例如大于设定 时间、小于设定时间或在设定时间范围内。然后根据录入的治疗数据的时间信息,计算各诊疗项目的实际时长,判断实际时长是否满足时长指标。In addition to verifying the time sequence requirements of each diagnosis and treatment item, this embodiment also verifies the duration requirements of each diagnosis and treatment item. When setting up a STEMI type patient diagnosis and treatment template, set the duration indicator for each diagnosis and treatment item, such as greater than the set time, less than the set time, or within the set time range. Then, based on the time information of the entered treatment data, the actual duration of each diagnosis and treatment item is calculated, and whether the actual duration meets the duration indicator is determined.
206、对治疗数据进行去标识化处理,得到目标治疗数据;206. De-identify the treatment data to obtain target treatment data;
207、提取目标治疗数据中的多个关键事件,并将关键事件进行融合处理,得到与疾病类型对应的疾病的医疗信息集合;207. Extract multiple key events in the target treatment data, and fuse the key events to obtain a collection of medical information for the disease corresponding to the disease type;
208、将医疗信息集合输入预设Bilstm模型中进行向量计算,得到疾病的医学特征向量,并对医学特征向量进行池化分析,得到医学前馈向量;208. Input the medical information set into the preset Bilstm model for vector calculation to obtain the medical feature vector of the disease, and perform pooling analysis on the medical feature vector to obtain the medical feedforward vector;
209、根据医学前馈向量,基于预设余弦相似度算法对医疗信息集合进行聚类分析,得到与疾病对应的医疗服务包。209. According to the medical feedforward vector, perform cluster analysis on the medical information collection based on the preset cosine similarity algorithm, and obtain the medical service package corresponding to the disease.
本实施例中步骤201、206-209与第一实施例中的步骤101、102-105类似,此处不再赘述。 Steps 201 and 206-209 in this embodiment are similar to steps 101 and 102-105 in the first embodiment, and will not be described again here.
本申请实施例中,通过从预设医疗信息平台中采集同类疾病的原始病例数据,并提取原始病例数据中的疾病类型和治疗数据;对治疗数据进行去标识化处理,得到目标治疗数据;提取目标治疗数据中的多个关键事件,并将关键事件进行融合处理,得到与疾病类型对应的疾病的医疗信息集合;将目标治疗数据输入预设Bilstm模型中,得到医疗数据的医学特征向量,并对医学特征向量进行池化分析,得到医学前馈向量;根据医学前馈向量,基于预设余弦相似度算法对医疗信息集合进行聚类分析,得到与疾病对应的医疗服务包。本申请通过对原始病例数据进行聚类分析得到具有共性特征的目标治疗数据,并根据目标治疗数据生成不同类型疾病对应服务包,解决了目前医疗服务缺少可持续化的跟踪服务,难以做到及时有效的反馈,对医疗数据信息缺乏有效集成,导致移动医疗服务质量无法保证的问题。有效提升医疗服务的效率和便携性,减少就医人员等待时间,缓解了医疗压力。In the embodiment of this application, original case data of similar diseases are collected from the preset medical information platform, and the disease type and treatment data in the original case data are extracted; the treatment data is de-identified to obtain the target treatment data; extraction Multiple key events in the target treatment data, and fuse the key events to obtain a medical information collection of diseases corresponding to the disease type; input the target treatment data into the preset Bilstm model to obtain the medical feature vector of the medical data, and Pooling analysis is performed on the medical feature vectors to obtain medical feedforward vectors; according to the medical feedforward vectors, cluster analysis is performed on the medical information collection based on the preset cosine similarity algorithm to obtain a medical service package corresponding to the disease. This application performs cluster analysis on original case data to obtain target treatment data with common characteristics, and generates service packages corresponding to different types of diseases based on the target treatment data, solving the problem of the lack of sustainable tracking services in current medical services and the difficulty in achieving timely The lack of effective feedback and effective integration of medical data information leads to the problem that the quality of mobile medical services cannot be guaranteed. Effectively improve the efficiency and portability of medical services, reduce waiting time for medical personnel, and relieve medical pressure.
请参阅图3,本申请实施例中基于病患数据的服务包生成方法的第三个实施例包括:Please refer to Figure 3. The third embodiment of the service package generation method based on patient data in the embodiment of this application includes:
301、从预设医疗信息平台中采集同类疾病的原始病例数据,并提取原始病例数据中的疾病类型和治疗数据;301. Collect original case data of similar diseases from the preset medical information platform, and extract disease types and treatment data from the original case data;
302、根据原始病例数据构建治疗数据查询数据库;302. Construct a treatment data query database based on original case data;
本实施例中,根据疾病影响学特点和临床治疗数据,结合患者HIS、RIS、PACS多个数据库的治疗数据,构建治疗数据查询数据库,并采用差分隐私算法与加密算法相结合的方式,对患者隐私数据去标识化,以更新该治疗数据查询数据库,从而可以根据更新的治疗数据查询数据库建立临床影像查询诊疗系统,本申请实施例既满足差分隐私保护模型中的隐秘性要求,同时保证了数据库中发布数据的可靠性,以帮助临床科研工作者查询和收集以往病例、大数据分析及评估,为促进医疗数据统计的自动化,消除信息孤岛,提供决策支持建立打下良好基础。In this embodiment, based on the disease impact characteristics and clinical treatment data, combined with the patient's treatment data from multiple databases of HIS, RIS, and PACS, a treatment data query database is constructed, and a differential privacy algorithm and an encryption algorithm are used to combine the patient's treatment data. The private data is de-identified to update the treatment data query database, so that a clinical image query and diagnosis and treatment system can be established based on the updated treatment data query database. The embodiment of this application not only meets the confidentiality requirements in the differential privacy protection model, but also ensures that the database The reliability of the data released in the system can help clinical researchers query and collect past cases, big data analysis and evaluation, and lay a good foundation for promoting the automation of medical data statistics, eliminating information islands, and providing decision support.
303、根据差分隐私算法,对治疗数据查询数据库的数据表中的敏感属性字段添加随机噪声;303. According to the differential privacy algorithm, add random noise to the sensitive attribute fields in the data table of the treatment data query database;
本实施例中,差分隐私算法通过在统计结果中加入适量噪音以确保修改数据集中一条个体记录不会对统计结果造成显著影响,从而满足了隐私保护的要求。In this embodiment, the differential privacy algorithm adds an appropriate amount of noise to the statistical results to ensure that modifying an individual record in the data set will not have a significant impact on the statistical results, thus meeting the requirements for privacy protection.
假设D
1和D
2为相邻数据集,S为在随机函数A所有可能的输出,Pr为A(D
1)获得某个值的概率,那么只要算法满足下面公式则可以说此算法满足ε-差分隐私的标准。
Assume that D 1 and D 2 are adjacent data sets, S is all possible outputs of random function A, and Pr is the probability that A (D 1 ) obtains a certain value. Then as long as the algorithm satisfies the following formula, it can be said that this algorithm satisfies ε -The standard for differential privacy.
Pr[A(D
1)∈S]≤e∈×Pr[A(D
2)∈S]
Pr[A(D 1 )∈S]≤e∈×Pr[A(D 2 )∈S]
其中,概率Pr[·]表示隐私被泄漏的风险,由算法A(D)的随机性控制;ε为隐私保护预算参数,用于调节平衡数据隐私安全性和数据可靠性,通过加入随机噪声的方式来实现隐私保护,即ε越小,加入的噪声越大,隐私保护程度越高,同理ε越大,加入的噪声越小,隐私保护安全性越弱。Among them, the probability Pr[·] represents the risk of privacy leakage, which is controlled by the randomness of algorithm A(D); ε is the privacy protection budget parameter, which is used to adjust and balance data privacy security and data reliability. By adding random noise To achieve privacy protection, that is, the smaller ε is, the greater the noise is added, and the higher the degree of privacy protection. Similarly, the larger ε is, the smaller the noise is added, and the weaker the privacy protection security is.
可选地,对不同数据类型的敏感性字段的原始治疗数据采用不同噪声机制添加随机噪声。例如,对于数值类型的敏感性字段的原始治疗数据采用拉普拉斯机制添加随机噪声,对于非数值类型的敏感性字段的原始治疗数据采用指数机制添加随机噪声。Optionally, random noise is added using different noise mechanisms to the raw treatment data for sensitivity fields of different data types. For example, the Laplacian mechanism is used to add random noise to the original treatment data of numeric type sensitivity fields, and the exponential mechanism is used to add random noise to the original treatment data of non-numeric type sensitivity fields.
拉普拉斯机制对数值型数据(连续数据)进行处理,比如患者年龄,对得到数值结果加入随机噪声即可实现差分隐私。指数机制处理非数值型(离散数据)数据,返回的不是确定性的结果,而是以一定概率值返回结果,输出是一组离散数据,可以由打分函数确定,得分高的输 出概率高,得分低的输出概率低。The Laplacian mechanism processes numerical data (continuous data), such as patient age, and adds random noise to the numerical results to achieve differential privacy. The exponential mechanism processes non-numeric (discrete data) data, and does not return deterministic results, but returns results with a certain probability value. The output is a set of discrete data, which can be determined by the scoring function. The output with a high score has a high probability, and the score Low output probability is low.
具体地对表中患者年龄、检查日期等数值类型敏感属性分别添加拉普拉斯噪声,对数据表中性别、教育程度、地区、检查设备类型、疾病等属性分别添加指数噪声,得到噪声结果,替换到数据表中。Specifically, Laplacian noise is added to numerically sensitive attributes such as patient age and examination date in the table, and exponential noise is added to attributes such as gender, education level, region, examination equipment type, and disease in the data table to obtain the noise results. Replace it in the data table.
304、根据随机噪声,对原始病例数据中的敏感属性字段进行去标识处理,得到标识符字段;304. Based on random noise, de-identify the sensitive attribute fields in the original case data to obtain the identifier field;
本实施例中,原始数据库结构大致分以下几个类别:显示标识符:能唯一标识单一个体的属性集,如表中患者姓名、病人号字段;敏感属性:包含隐私数据的属性集:患者性别、年龄、教育程度、地区、检查设备类型、疾病等;非敏感属性:除了上述类别的属性集。In this embodiment, the original database structure is roughly divided into the following categories: display identifier: a set of attributes that can uniquely identify a single individual, such as the patient name and patient number fields in the table; sensitive attributes: an attribute set containing private data: patient gender , age, education level, region, type of inspection equipment, disease, etc.; non-sensitive attributes: attribute sets in addition to the above categories.
305、对标识符字段进行加密处理,得到病患的目标治疗数据;305. Encrypt the identifier field to obtain the patient's target treatment data;
本实施例中,根据数据加密标准DES加密算法和Base64编码,对所述数据表的标识符字段进行加密处理;DES对称加密是一种比较传统的加密方式,具有极高安全性。Base64是一种基于64个可打印字符来表示二进制数据的表示方法。将这两个方法相结合,满足对患者姓名、检查号、影像号等唯一标识的敏感属性的加密需求。In this embodiment, the identifier field of the data table is encrypted according to the data encryption standard DES encryption algorithm and Base64 encoding; DES symmetric encryption is a relatively traditional encryption method and has extremely high security. Base64 is a representation of binary data based on 64 printable characters. These two methods are combined to meet the encryption requirements for uniquely identified sensitive attributes such as patient names, examination numbers, and image numbers.
306、提取目标治疗数据中的多个关键事件,并将关键事件进行融合处理,得到与疾病类型对应的疾病的医疗信息集合;306. Extract multiple key events in the target treatment data, and fuse the key events to obtain a collection of medical information for the disease corresponding to the disease type;
307、将医疗信息集合输入预设Bilstm模型中进行向量计算,得到疾病的医学特征向量,并对医学特征向量进行池化分析,得到医学前馈向量;307. Input the medical information set into the preset Bilstm model for vector calculation to obtain the medical feature vector of the disease, and perform pooling analysis on the medical feature vector to obtain the medical feedforward vector;
308、根据医学前馈向量,基于预设余弦相似度算法对医疗信息集合进行聚类分析,得到与疾病对应的医疗服务包。308. According to the medical feedforward vector, perform cluster analysis on the medical information collection based on the preset cosine similarity algorithm, and obtain the medical service package corresponding to the disease.
本实施例中步骤301、306-308与第一实施例中的步骤101、103-105类似,此处不再赘述。 Steps 301, 306-308 in this embodiment are similar to steps 101, 103-105 in the first embodiment, and will not be described again here.
本申请实施例中,通过从预设医疗信息平台中采集同类疾病的原始病例数据,并提取原始病例数据中的疾病类型和治疗数据;对治疗数据进行去标识化处理,得到目标治疗数据;提取目标治疗数据中的多个关键事件,并将关键事件进行融合处理,得到与疾病类型对应的疾病的医疗信息集合;将目标治疗数据输入预设Bilstm模型中,得到所述医疗数据的医学特征向量,并对医学特征向量进行池化分析,得到医学前馈向量;根据医学前馈向量,基于预设余弦相似度算法对医疗信息集合进行聚类分析,得到与疾病对应的医疗服务包。本申请通过对原始病例数据进行聚类分析得到具有共性特征的目标治疗数据,并根据目标治疗数据生成不同类型疾病对应服务包,解决了目前医疗服务缺少可持续化的跟踪服务,难以做到及时有效的反馈,对医疗数据信息缺乏有效集成,导致移动医疗服务质量无法保证的问题。有效提升医疗服务的效率和便携性,减少就医人员等待时间,缓解了医疗压力。In the embodiment of this application, original case data of similar diseases are collected from the preset medical information platform, and the disease type and treatment data in the original case data are extracted; the treatment data is de-identified to obtain the target treatment data; extraction Target multiple key events in the treatment data and fuse the key events to obtain a medical information set of diseases corresponding to the disease type; input the target treatment data into the preset Bilstm model to obtain the medical feature vector of the medical data , and perform pooling analysis on the medical feature vectors to obtain medical feedforward vectors; according to the medical feedforward vectors, perform cluster analysis on the medical information collection based on the preset cosine similarity algorithm, and obtain the medical service package corresponding to the disease. This application performs cluster analysis on original case data to obtain target treatment data with common characteristics, and generates service packages corresponding to different types of diseases based on the target treatment data, solving the problem of the lack of sustainable tracking services in current medical services and the difficulty in achieving timely The lack of effective feedback and effective integration of medical data information leads to the problem that the quality of mobile medical services cannot be guaranteed. Effectively improve the efficiency and portability of medical services, reduce waiting time for medical personnel, and relieve medical pressure.
请参阅图4,本申请实施例中基于病患数据的服务包生成方法的第四个实施例包括:Please refer to Figure 4. The fourth embodiment of the service package generation method based on patient data in the embodiment of this application includes:
401、从预设医疗信息平台中采集同类疾病的原始病例数据,并提取原始病例数据中的疾病类型和治疗数据;401. Collect original case data of similar diseases from the preset medical information platform, and extract disease types and treatment data in the original case data;
402、对治疗数据进行去标识化处理,得到目标治疗数据;402. De-identify the treatment data to obtain target treatment data;
403、获取预定义的关键事件集合;403. Obtain the predefined key event collection;
本实施例中,获取预定义的关键事件集合;根据关键事件集合从治疗数据中提取目标对象对应的多个关键事件。预定义的关键事件集合可以是指工作人员根据各种疾病在不同临床分期对应的特殊特点设定的、对应的疾病的关键事件的集合,即该预定义的关键事件集合中可以包括目标疾病对应的不同临床分期的关键事件,依据预先定义好的关键事件集合从治疗数据中提取目标对象对应的多个关键事件。通过预定义关键事件集合,能够有效保证提取的多个关键事件的准确性,同时保证提取关键事件的提取效率。In this embodiment, a predefined key event set is obtained; multiple key events corresponding to the target object are extracted from the treatment data according to the key event set. The predefined key event set may refer to a set of key events corresponding to the disease set by the staff based on the special characteristics of various diseases in different clinical stages. That is, the predefined key event set may include the corresponding key events of the target disease. The key events of different clinical stages are extracted from the treatment data based on the predefined key event set to extract multiple key events corresponding to the target object. By predefining key event sets, the accuracy of multiple key events extracted can be effectively ensured, while the extraction efficiency of key events can be ensured.
优选的,预定义的关键事件集合可以周期性地进行更新,也可以是在新的关键事件出现时进行即时更新,具体可以是通过人工的形式或者通过爬虫工具或者人工智能进行更新,本示例实施例对此不做特殊限定。Preferably, the predefined key event set can be updated periodically, or it can be updated immediately when new key events occur. Specifically, it can be updated manually or through crawler tools or artificial intelligence. This example implements There are no special restrictions on this.
404、对目标治疗数据进行筛选处理以过滤目标治疗数据中的无效医疗数据,得到有效医疗数据;404. Filter the target treatment data to filter invalid medical data in the target treatment data and obtain valid medical data;
本实施例中,对治疗数据进行筛选处理以过滤治疗数据中的无效医疗数据;根据关键事件集合从过滤后的治疗数据中提取目标对象对应的多个关键事件。无效医疗数据可以是指治疗数据中对于目标对象的目标疾病的诊治无意义的治疗数据,例如无效医疗数据可以是治疗数据中治疗目标对象的非目标疾病对应的治疗数据,也可以是指治疗数据中治疗目标疾病对应的不完整的治疗数据(例如由于特殊原因导致治疗到一半就结束治疗的治疗数据),当然还可以是治疗数据中其他对于目标对象的目标疾病的诊治无意义的治疗数据,本示例实施例对此不做特殊限定。通过过滤治疗数据中的无效医疗数据,能够有效提升治疗数据的准确性,进一步保证关键事件的提取准确度,减少不必要的计算,节省计算资源。In this embodiment, the treatment data is screened to filter invalid medical data in the treatment data; multiple key events corresponding to the target object are extracted from the filtered treatment data according to the key event set. Invalid medical data may refer to treatment data in the treatment data that is meaningless for the diagnosis and treatment of the target disease of the target object. For example, invalid medical data may be treatment data corresponding to non-target diseases of the target object in the treatment data, or it may refer to treatment data. Incomplete treatment data corresponding to the target disease in the treatment (for example, treatment data that ends halfway through treatment due to special reasons). Of course, it can also be other treatment data in the treatment data that are meaningless for the diagnosis and treatment of the target disease of the target object. This example embodiment does not impose special limitations on this. By filtering invalid medical data in treatment data, the accuracy of treatment data can be effectively improved, further ensuring the accuracy of extraction of key events, reducing unnecessary calculations, and saving computing resources.
405、根据关键事件集合对有效医疗数据进行提取,得到目标治疗数据中的多个关键事件;405. Extract effective medical data according to the key event set to obtain multiple key events in the target treatment data;
本实施例中,在一个示例实施例中,第二关键事件可以是指将多个治疗数据作为一个整体进行关键事件提取确定的事件,例如第二关键事件可以是目标对象对应系列的所有治疗数据中的首次诊疗事件,也可以是所有治疗数据中的药物不良反应事件,当然,第二关键事件还可以是对跨越多次就诊记录的治疗数据进行联合判断确定的事件,例如对于复发后的首次化疗事件,首先确定首次复发的首次就诊,然后在此就诊后,在所有就诊的药物医嘱中抽取化疗药物,最后找到出现化疗药物的第一次就诊,则能确定复发后首次化疗事件,当然,此处仅是示意性举例说明,并不应对本示例实施例造成任何特殊限定。In this embodiment, in an example embodiment, the second key event may refer to an event determined by extracting key events from multiple treatment data as a whole. For example, the second key event may be all treatment data of the corresponding series of the target object. The first diagnosis and treatment event in , can also be an adverse drug reaction event in all treatment data. Of course, the second key event can also be an event determined by joint judgment on treatment data recorded across multiple visits, such as for the first time after relapse. For chemotherapy events, first determine the first visit for the first recurrence, then extract chemotherapy drugs from the drug orders of all visits after this visit, and finally find the first visit where chemotherapy drugs appear, then the first chemotherapy event after recurrence can be determined. Of course, This is only a schematic illustration, and should not impose any special limitations on this exemplary embodiment.
在另一个示例实施例中,根据从排序后的单个治疗数据中提取的第一关键事件,以及从排序后的多个治疗数据中提取的第二关键事件按时间顺序共同构成目标对象的治疗数据对应的关键事件。通过分别从排序后的单个治疗数据中提取的第一关键事件,以及从排序后的多个治疗数据中提取的第二关键事件构建关键事件,能够避免由于筛选不够细致,或者由于存在需要对跨越多次就诊记录的治疗数据进行联合判断确定的事件而导致的关键事件遗漏或者关键事件不准确的问题,提升关键事件的准确度。In another example embodiment, the treatment data of the target subject are jointly constituted in chronological order according to the first key event extracted from the sorted single treatment data and the second key event extracted from the sorted plurality of treatment data. corresponding key events. By constructing key events from the first key event extracted from the sorted single treatment data and the second key event extracted from the sorted multiple treatment data, it is possible to avoid the need for insufficient screening or the need to span across The treatment data recorded in multiple visits are jointly judged to determine the events, which will lead to the problem of missing key events or inaccurate key events, thereby improving the accuracy of key events.
406、确定关键事件对应的属性特征和属性特征的权重值;406. Determine the attribute characteristics corresponding to the key events and the weight values of the attribute characteristics;
本实施例中,确定关键事件对应的属性特征和属性特征的权重值。其中,确定属性特征的权重值;根据属性特征以及权重值确定多个目标关键事件的得分数据以根据得分数据对多个目标关键事件进行去重处理以过滤含义相同的目标关键事件。权重值可以是指预先设定的属性特征的权重,例如时间属性的权重可以是0.3,复发类型的权重可以是0.7,当然,属性特征的权重值可以根据实际情况进行自定义设置,本示例实施例对此不做特殊限定。得分数据可以是指用于判断两个关键事件是否为相似关键事件的相似度数据,彼此得分数据越高的两个关键事件,则认为该两个关键事件为相似关键事件,仅保留其中一个即可。In this embodiment, the attribute characteristics corresponding to the key events and the weight values of the attribute characteristics are determined. Among them, the weight value of the attribute feature is determined; the score data of multiple target key events are determined based on the attribute feature and the weight value, and the multiple target key events are deduplicated based on the score data to filter target key events with the same meaning. The weight value can refer to the weight of a preset attribute feature. For example, the weight of the time attribute can be 0.3, and the weight of the recurrence type can be 0.7. Of course, the weight value of the attribute feature can be customized according to the actual situation. This example implements There are no special restrictions on this. Score data can refer to similarity data used to determine whether two key events are similar key events. The higher the score data of two key events, the two key events are considered to be similar key events, and only one of them is retained. Can.
407、根据属性特征以及权重值确定关键事件的得分数据,并根据得分数据得到与疾病类型对应的疾病的医疗信息集合;407. Determine the score data of key events according to the attribute characteristics and weight values, and obtain the medical information collection of the disease corresponding to the disease type based on the score data;
本实施例中,融合处理可以是指将多个关键事件按照特定顺序构建成目标数据的处理过程,例如融合处理可以包括但不限于通过时间顺序对多个关键事件进行排列,本示例实施例对此不做特殊限定。目标数据可以是指将多个关键事件进行融合处理生成的、目标对象的治疗数据对应的结构化数据。In this embodiment, fusion processing may refer to a process of constructing multiple key events into target data in a specific order. For example, fusion processing may include but is not limited to arranging multiple key events in chronological order. In this example embodiment, There are no special restrictions on this. Target data may refer to structured data generated by fusion processing of multiple key events and corresponding to the treatment data of the target object.
在将多个关键事件进行融合处理之前,首先确定多个关键事件对应的属性特征。属性特征可以是指关键事件对应的不同属性,例如属性特征可以是关键事件对应的时间属性,也可以是关键事件对应的复发类型,当然,属性特征还可以是关键事件对应的其他属性,本示例实施例对此不做特殊限定。Before fusing multiple key events, first determine the attribute characteristics corresponding to the multiple key events. Attribute characteristics can refer to different attributes corresponding to key events. For example, attribute characteristics can be the time attribute corresponding to key events, or the recurrence type corresponding to key events. Of course, attribute characteristics can also be other attributes corresponding to key events. In this example The embodiment does not specifically limit this.
408、将医疗信息集合输入预设Bilstm模型中进行向量计算,得到疾病的医学特征向量,并对医学特征向量进行池化分析,得到医学前馈向量;408. Input the medical information set into the preset Bilstm model for vector calculation to obtain the medical feature vector of the disease, and perform pooling analysis on the medical feature vector to obtain the medical feedforward vector;
409、根据医学前馈向量,基于预设余弦相似度算法对医疗信息集合进行聚类分析,得到与疾病对应的医疗服务包。409. According to the medical feedforward vector, perform cluster analysis on the medical information collection based on the preset cosine similarity algorithm, and obtain the medical service package corresponding to the disease.
本实施例中步骤401-402、408-409与第一实施例中的步骤101-102、104-105类似,此处不再赘述。Steps 401-402 and 408-409 in this embodiment are similar to steps 101-102 and 104-105 in the first embodiment, and will not be described again here.
在本申请实施例中,通过从预设医疗信息平台中采集同类疾病的原始病例数据,并提取原 始病例数据中的疾病类型和治疗数据;对治疗数据进行去标识化处理,得到目标治疗数据;提取目标治疗数据中的多个关键事件,并将关键事件进行融合处理,得到与疾病类型对应的疾病的医疗信息集合;将目标治疗数据输入预设Bilstm模型中,得到医疗数据的医学特征向量,并对医学特征向量进行池化分析,得到医学前馈向量;根据医学前馈向量,基于预设余弦相似度算法对医疗信息集合进行聚类分析,得到与疾病对应的医疗服务包。本申请通过对原始病例数据进行聚类分析得到具有共性特征的目标治疗数据,并根据目标治疗数据生成不同类型疾病对应服务包,解决了目前医疗服务缺少可持续化的跟踪服务,难以做到及时有效的反馈,对医疗数据信息缺乏有效集成,导致移动医疗服务质量无法保证的问题。有效提升医疗服务的效率和便携性,减少就医人员等待时间,缓解了医疗压力。In the embodiment of this application, the original case data of similar diseases is collected from the preset medical information platform, and the disease type and treatment data in the original case data are extracted; the treatment data is de-identified to obtain the target treatment data; Extract multiple key events in the target treatment data and fuse the key events to obtain a medical information collection of diseases corresponding to the disease type; input the target treatment data into the preset Bilstm model to obtain the medical feature vector of the medical data. The medical feature vectors are pooled and analyzed to obtain medical feedforward vectors; according to the medical feedforward vectors, the medical information collection is clustered and analyzed based on the preset cosine similarity algorithm to obtain the medical service package corresponding to the disease. This application performs cluster analysis on original case data to obtain target treatment data with common characteristics, and generates service packages corresponding to different types of diseases based on the target treatment data, solving the problem of the lack of sustainable tracking services in current medical services and the difficulty in achieving timely The lack of effective feedback and effective integration of medical data information leads to the problem that the quality of mobile medical services cannot be guaranteed. Effectively improve the efficiency and portability of medical services, reduce waiting time for medical personnel, and relieve medical pressure.
请参阅图5,本申请实施例中基于病患数据的服务包生成方法的第五个实施例包括:Please refer to Figure 5. The fifth embodiment of the service package generation method based on patient data in the embodiment of this application includes:
501、从预设医疗信息平台中采集同类疾病的原始病例数据,并提取原始病例数据中的疾病类型和治疗数据;501. Collect original case data of similar diseases from the preset medical information platform, and extract the disease type and treatment data in the original case data;
502、对治疗数据进行去标识化处理,得到目标治疗数据;502. De-identify the treatment data to obtain target treatment data;
503、提取目标治疗数据中的多个关键事件,并将关键事件进行融合处理,得到与疾病类型对应的疾病的医疗信息集合;503. Extract multiple key events in the target treatment data, and fuse the key events to obtain a medical information collection of diseases corresponding to the disease type;
504、对医学特征向量进行特征抽取,得到医学特征向量的目标医学特征;504. Extract features from the medical feature vector to obtain the target medical features of the medical feature vector;
本实施例中,医学特征向量经过窗口大小不同的卷积层以及其下滤波单元特征抽取后,被输入到了两个不同的并行池化层进行降维——avg池化层和max池化层,充分结合了max池化层动态提取的特质与avg池化层对短文本平均语义的贡献能力,有效降低了降维损失语义信息的情况;最后在串接层进行必要的语义拼接,构成了原始前馈向量和医学前馈向量;其中,max池化层动态提取方法和avg池化层对短文本平均语义方法,考虑了对卷积核滑动窗口的高度对生成特征图的影响。即把卷积核高度作为特征图下采样个数M的重要依据,卷积核越高,下采样个数就少,反之,卷积核高度越低,下采样个数越多。In this embodiment, the medical feature vector is input into two different parallel pooling layers for dimensionality reduction after passing through convolutional layers with different window sizes and feature extraction of filtering units below them - avg pooling layer and max pooling layer. , fully combines the dynamic extraction characteristics of the max pooling layer and the contribution ability of the avg pooling layer to the average semantics of short texts, effectively reducing the loss of semantic information during dimensionality reduction; finally, the necessary semantic splicing is performed in the concatenation layer to form Original feedforward vector and medical feedforward vector; among them, the max pooling layer dynamic extraction method and the avg pooling layer average semantic method for short text take into account the impact of the height of the convolution kernel sliding window on the generated feature map. That is, the height of the convolution kernel is used as an important basis for the number M of down-sampling of the feature map. The higher the convolution kernel, the fewer the number of down-sampling. On the contrary, the lower the height of the convolution kernel, the greater the number of down-sampling.
505、获取目标医学特征的频率;505. Obtain the frequency of target medical features;
本实施例中,目标医学特征的频率可以是目标医学特征在不同的历史分析模型中出现的频率。对于不同的目标医学特征,其在多个历史分析模型中出现的频率可以不同。例如,对于手术费,可以出现在各种不同的历史分析模型中,出现频率较高,而对于葡萄糖含量,可能仅出现在糖尿病手术分析模型中,出现频率较低。In this embodiment, the frequency of the target medical feature may be the frequency of occurrence of the target medical feature in different historical analysis models. For different target medical features, their frequency of occurrence in multiple historical analysis models can be different. For example, the surgical fee may appear in a variety of different historical analysis models with higher frequency, while the glucose content may only appear in the diabetes surgery analysis model with lower frequency.
具体地,目标医学特征在不同的历史分析模型中出现的频率越高,如目标医学特征在所有历史分析模型中都有出现,则可以确定该目标医学特征越重要。从而可以根据特征出现在不同历史分析模型中的频率,进行目标医学特征的确定。Specifically, the higher the frequency of the target medical feature appearing in different historical analysis models. If the target medical feature appears in all historical analysis models, it can be determined that the more important the target medical feature is. Thus, target medical features can be determined based on the frequency of features appearing in different historical analysis models.
频率符合要求可以是目标医学特征在历史分析模型中出现的频率满足一定的阈值条件,或者对目标医学特征在历史分析模型中出现的频率进行排序,排序满足一定要求的目标医学特征为频率符合要求的目标医学特征。Frequency compliance requirements can be that the frequency of the target medical features appearing in the historical analysis model meets certain threshold conditions, or the frequency of the target medical features appearing in the historical analysis model is sorted, and the target medical features that meet certain requirements are ranked as frequency compliance requirements. target medical characteristics.
506、提取目标治疗数据中与目标医学特征对应的第一数据;506. Extract the first data corresponding to the target medical feature in the target treatment data;
本实施例中,多维数据可以是指存储于数据库中的所有数据,可以包括每一次数据变更时新增的数据以及变更前的历史数据,例如,对应于前文的医保数据,初始数据是指用户就医后生成的存储于用户名下的就医数据,可以包括历史就医数据以及本次就医数据,具体可以包括但不限于问诊地点、问诊时间、国际疾病分类(InternationalClassification of Diseases,ICD)、挂号科室、挂号医生信息、挂号费、付费方式、检查项目、检查费、病情描述、就诊建议、药品清单、药品价格、用药剂量、付费窗口、取药窗口、是否复诊、复诊时间、问诊次数等数据。In this embodiment, multidimensional data may refer to all data stored in the database, and may include data newly added every time the data is changed as well as historical data before the change. For example, corresponding to the medical insurance data mentioned above, the initial data refers to the user data. The medical data generated after medical treatment and stored under the user's name can include historical medical data and current medical data. Specifically, it can include but is not limited to consultation location, consultation time, International Classification of Diseases (ICD), registration Department, registered doctor information, registration fee, payment method, examination items, examination fee, condition description, medical treatment suggestions, drug list, drug price, drug dosage, payment window, drug collection window, whether to return for follow-up consultation, time for follow-up consultation, number of consultations, etc. data.
具体地,服务器可以基于选择的目标医学特征,从多维数据中提取得到初始数据,提取得到初始数据可以分为多类,例如,对于医保数据,可以包括但不限于本次就医费用数据、本次就医ICD数据、历史就医数据。其中,本次就医费用数据可以包括但不限于手术费、药品费、检查费等;本次就医ICD数据可以包括但不限于本次确诊ICD的费用,该ICD的平均费用等; 历史就医数据可以包括但不限于本地门诊次数、本地住院次数、异地门诊次数、异地住院次数、本地门诊次数占比、异地门诊次数占比等数据。Specifically, the server can extract initial data from multi-dimensional data based on the selected target medical features. The extracted initial data can be divided into multiple categories. For example, for medical insurance data, it can include but is not limited to this medical expense data, this time Medical ICD data, historical medical data. Among them, the cost data of this medical treatment can include but is not limited to surgery fees, drug fees, examination fees, etc.; the ICD data of this medical treatment can include but is not limited to the cost of this confirmed ICD, the average cost of the ICD, etc.; historical medical data can Including but not limited to data such as the number of local outpatient clinics, the number of local hospitalizations, the number of outpatient clinics in other places, the number of hospitalizations in other places, the proportion of local outpatient visits, the proportion of outpatient visits in other places, etc.
507、对不同类型的第一数据进行数据处理,得到标准数据;507. Perform data processing on different types of first data to obtain standard data;
本实施例中,提取得到的初始数据由于数据类型的不同,其数据量级可能存在较大差异,例如,药品费为500,而总费用为1000000,两者的数据量级相差巨大。In this embodiment, due to different data types, the data magnitude of the extracted initial data may be greatly different. For example, the drug fee is 500, and the total cost is 1,000,000. There is a huge difference in the data magnitude between the two.
服务器可以通过同一数据量级的数据处理的方法,对不同数据量级的初始数据进行数据处理,得到数据量级相同的标准数据。例如,延用前例,对药品费以及总费用进行同一数据量级的数据处理,得到数据量级在0至100之间的药品费和总费用,即得到的标准药品费为0.05,标准总费用为100。The server can perform data processing on initial data of different data levels through data processing methods of the same data level to obtain standard data of the same data level. For example, following the previous example, the drug fee and total cost are processed with the same data magnitude, and the drug fee and total cost with data magnitude between 0 and 100 are obtained, that is, the standard drug fee obtained is 0.05, and the standard total cost is is 100.
具体地,同一数据量级的数据处理方法可以根据数据类型的不同或者根据数据量级的不同选用,例如,可以选用开方、平方、立方、指数、对数等方法,本申请对此不作限制。Specifically, data processing methods of the same data level can be selected based on different data types or different data levels. For example, methods such as square root, square, cube, exponential, logarithm, etc. can be selected, and this application does not limit this. .
508、对标准数据进行降维处理,得到预设维度的目标数据;508. Perform dimensionality reduction processing on the standard data to obtain target data with preset dimensions;
本实施例中,预设维度可以是用户根据后续数据处理的需求,通过终端对服务器进行预先设置的维度,预设维度的目标数据的数据量可以小于标准数据的数据量。非线性降维处理的方法可以包括但不限于等量度映射(IsometricFeatureMapping,Isomap)、局部线性嵌入(Locally Linear Embedding,LLE)、改进的局部线性嵌入(Modified Locally Linear Embedding,MLLE)、Hessian Eigenmapping、谱嵌入(Spectral Embedding)、局部切空间排列算法(Local Tangent SpaceAlignment,LTSA)、多维标度法(Multi-dimensional Scaling,MDS)、t-分布随机邻域嵌入(t-distributedStochastic Neighbor Embedding,t-SNE)等。In this embodiment, the preset dimensions may be dimensions preset by the user on the server through the terminal according to subsequent data processing requirements. The data volume of the target data in the preset dimensions may be smaller than the data volume of the standard data. Nonlinear dimensionality reduction processing methods can include but are not limited to IsometricFeatureMapping (Isomap), Locally Linear Embedding (LLE), Modified Locally Linear Embedding (MLLE), Hessian Eigenmapping, Spectrum Embedding (Spectral Embedding), Local Tangent SpaceAlignment (LTSA), Multi-dimensional Scaling (MDS), t-distributedStochastic Neighbor Embedding (t-SNE) wait.
在实际应用中,也可以采用线性降维处理的方法,可以包括但不限于主成分分析法(Principal Component Analysis,PCA)、核主成分分析法(kernel PCA)、增量主成分分析法(Incremental PCA)等。具体地,服务器可以根据如上方法,利用数据在多维度上的黎曼空间里的聚类特征,将多维度的标准数据映射到低维度,例如,映射到2维,得到目标数据。In practical applications, linear dimensionality reduction processing methods can also be used, which can include but are not limited to Principal Component Analysis (PCA), kernel PCA, and incremental principal component analysis (Incremental). PCA) etc. Specifically, the server can use the clustering characteristics of the data in the multi-dimensional Riemannian space according to the above method to map the multi-dimensional standard data to a low dimension, for example, to 2 dimensions, to obtain the target data.
上述数据降维处理方法中,通过历史分析模型得到的目标医学特征,然后提取多维数据中与目标医学特征对应的初始数据,并对同一数据量级的数据处理后得到标准数据,对标准数进行非线性降维处理,得到预设维度的目标数据。生成的目标数据基于多维数据生成,与多维数据之间存在关联,从而可以保持多维数据的特征,进而可以通过目标数据进行后续的数据处理分析。In the above data dimensionality reduction processing method, the target medical features are obtained through the historical analysis model, and then the initial data corresponding to the target medical features in the multidimensional data are extracted, and the standard data is obtained after processing the data of the same data magnitude, and the standard numbers are Nonlinear dimensionality reduction processing is used to obtain target data with preset dimensions. The generated target data is generated based on multi-dimensional data and is related to the multi-dimensional data, so that the characteristics of the multi-dimensional data can be maintained, and subsequent data processing and analysis can be performed through the target data.
509、对目标数据进行池化处理,得到目标治疗数据的医学前馈向量;509. Perform pooling processing on the target data to obtain the medical feedforward vector of the target treatment data;
本实施例中,池化分析为基于卷积神经网络提出的池化,在本实施例中运用极大池化(maxpooling)和均值池化(avg pooling)医学特征向量进行处理,采用结合极大池化(max pooling)和均值池化(avg pooling)并行双池化层的方式进行池化操作,从而保留医学特征向量更深层次的语义信息。In this embodiment, the pooling analysis is based on the pooling proposed by the convolutional neural network. In this embodiment, maximum pooling (maxpooling) and average pooling (avg pooling) medical feature vectors are used to process, and a combination of maximum pooling is used. (max pooling) and mean pooling (avg pooling) perform pooling operations in parallel dual pooling layers to retain deeper semantic information of medical feature vectors.
510、根据医学前馈向量,基于预设余弦相似度算法对医疗信息集合进行聚类分析,得到与疾病对应的医疗服务包。510. According to the medical feedforward vector, perform cluster analysis on the medical information collection based on the preset cosine similarity algorithm, and obtain the medical service package corresponding to the disease.
本实施例中步骤501-503、510第一实施例中的步骤101-103、105类似,此处不再赘述。Steps 501-503 and 510 in this embodiment are similar to steps 101-103 and 105 in the first embodiment, and will not be described again here.
本申请实施例中,通过从预设医疗信息平台中采集同类疾病的原始病例数据,并提取原始病例数据中的疾病类型和治疗数据;对治疗数据进行去标识化处理,得到目标治疗数据;提取目标治疗数据中的多个关键事件,并将关键事件进行融合处理,得到与疾病类型对应的疾病的医疗信息集合;将目标治疗数据输入预设Bilstm模型中,得到医疗数据的医学特征向量,并对医学特征向量进行池化分析,得到医学前馈向量;根据医学前馈向量,基于预设余弦相似度算法对医疗信息集合进行聚类分析,得到与疾病对应的医疗服务包。本申请通过对原始病例数据进行聚类分析得到具有共性特征的目标治疗数据,并根据目标治疗数据生成不同类型疾病对应服务包,解决了目前医疗服务缺少可持续化的跟踪服务,难以做到及时有效的反馈,对医疗数据信息缺乏有效集成,导致移动医疗服务质量无法保证的问题。有效提升医疗服务的效率和便携性,减少就医人员等待时间,缓解了医疗压力。In the embodiment of this application, original case data of similar diseases are collected from the preset medical information platform, and the disease type and treatment data in the original case data are extracted; the treatment data is de-identified to obtain the target treatment data; extraction Multiple key events in the target treatment data, and fuse the key events to obtain a medical information collection of diseases corresponding to the disease type; input the target treatment data into the preset Bilstm model to obtain the medical feature vector of the medical data, and Pooling analysis is performed on the medical feature vectors to obtain medical feedforward vectors; according to the medical feedforward vectors, cluster analysis is performed on the medical information collection based on the preset cosine similarity algorithm to obtain a medical service package corresponding to the disease. This application performs cluster analysis on original case data to obtain target treatment data with common characteristics, and generates service packages corresponding to different types of diseases based on the target treatment data, solving the problem of the lack of sustainable tracking services in current medical services and the difficulty in achieving timely The lack of effective feedback and effective integration of medical data information leads to the problem that the quality of mobile medical services cannot be guaranteed. Effectively improve the efficiency and portability of medical services, reduce waiting time for medical personnel, and relieve medical pressure.
上面对本申请实施例中基于病患数据的服务包生成方法进行了描述,下面对本申请实施例中基于病患数据的服务包生成装置进行描述,请参阅图6,本申请实施例中基于病患数据的服务包生成装置的第一个实施例包括:The method for generating a service package based on patient data in the embodiment of the present application is described above. The device for generating a service package based on patient data in the embodiment of the present application is described below. Please refer to Figure 6. In the embodiment of the present application, the method of generating a service package based on patient data is described. A first embodiment of a data service package generating device includes:
采集模块601,用于从预设医疗信息平台中采集同类疾病的原始病例数据,并提取所述原始病例数据中的疾病类型和治疗数据;The collection module 601 is used to collect original case data of similar diseases from the preset medical information platform, and extract the disease type and treatment data in the original case data;
去标识模块602,用于对所述治疗数据进行去标识化处理,得到目标治疗数据; De-identification module 602 is used to de-identify the treatment data to obtain target treatment data;
融合模块603,用于提取所述目标治疗数据中的多个关键事件,并将所述关键事件进行融合处理,得到与所述疾病类型对应的疾病的医疗信息集合;The fusion module 603 is used to extract multiple key events in the target treatment data and perform fusion processing on the key events to obtain a medical information set of diseases corresponding to the disease type;
池化模块604,用于将所述医疗信息集合输入预设Bilstm模型中进行向量计算,得到所述疾病的医学特征向量,并对所述医学特征向量进行池化分析,得到医学前馈向量; Pooling module 604 is used to input the medical information set into the preset Bilstm model for vector calculation to obtain the medical feature vector of the disease, and perform pooling analysis on the medical feature vector to obtain a medical feedforward vector;
聚类模块605,用于根据所述医学前馈向量,基于预设余弦相似度算法对所述医疗信息集合进行聚类分析,得到与所述疾病对应的医疗服务包。The clustering module 605 is configured to perform cluster analysis on the medical information set based on the medical feedforward vector and a preset cosine similarity algorithm to obtain a medical service package corresponding to the disease.
本申请实施例中,通过从预设医疗信息平台中采集同类疾病的原始病例数据,并提取所述原始病例数据中的疾病类型和治疗数据;对所述治疗数据进行去标识化处理,得到目标治疗数据;提取所述目标治疗数据中的多个关键事件,并将所述关键事件进行融合处理,得到与所述疾病类型对应的疾病的医疗信息集合;将所述目标治疗数据输入预设Bilstm模型中,得到所述医疗数据的医学特征向量,并对所述医学特征向量进行池化分析,得到医学前馈向量;根据所述医学前馈向量,基于预设余弦相似度算法对所述医疗信息集合进行聚类分析,得到与所述疾病对应的医疗服务包。本申请通过对原始病例数据进行聚类分析得到具有共性特征的目标治疗数据,并根据目标治疗数据生成不同类型疾病对应服务包,解决了目前医疗服务缺少可持续化的跟踪服务,难以做到及时有效的反馈,对医疗数据信息缺乏有效集成,导致移动医疗服务质量无法保证的问题。有效提升医疗服务的效率和便携性,减少就医人员等待时间,缓解了医疗压力。In the embodiment of the present application, original case data of similar diseases are collected from a preset medical information platform, and the disease type and treatment data in the original case data are extracted; the treatment data is de-identified to obtain the target Treatment data; extract multiple key events in the target treatment data, and perform fusion processing on the key events to obtain a medical information set of diseases corresponding to the disease type; input the target treatment data into the preset Bilstm In the model, the medical feature vector of the medical data is obtained, and the medical feature vector is pooled and analyzed to obtain a medical feedforward vector; according to the medical feedforward vector, the medical feature vector is calculated based on a preset cosine similarity algorithm. The information collection is subjected to cluster analysis to obtain the medical service package corresponding to the disease. This application performs cluster analysis on original case data to obtain target treatment data with common characteristics, and generates service packages corresponding to different types of diseases based on the target treatment data, solving the problem of the lack of sustainable tracking services in current medical services and the difficulty in achieving timely The lack of effective feedback and effective integration of medical data information leads to the problem that the quality of mobile medical services cannot be guaranteed. Effectively improve the efficiency and portability of medical services, reduce waiting time for medical personnel, and relieve medical pressure.
请参阅图7,本申请实施例中基于病患数据的服务包生成装置的第二个实施例,该基于病患数据的服务包生成装置具体包括:Please refer to Figure 7 , which is a second embodiment of a service package generation device based on patient data in the embodiment of this application. The service package generation device based on patient data specifically includes:
采集模块601,用于从预设医疗信息平台中采集同类疾病的原始病例数据,并提取所述原始病例数据中的疾病类型和治疗数据;The collection module 601 is used to collect original case data of similar diseases from the preset medical information platform, and extract the disease type and treatment data in the original case data;
去标识模块602,用于对所述治疗数据进行去标识化处理,得到目标治疗数据; De-identification module 602 is used to de-identify the treatment data to obtain target treatment data;
融合模块603,用于提取所述目标治疗数据中的多个关键事件,并将所述关键事件进行融合处理,得到与所述疾病类型对应的疾病的医疗信息集合;The fusion module 603 is used to extract multiple key events in the target treatment data and perform fusion processing on the key events to obtain a medical information set of diseases corresponding to the disease type;
池化模块604,用于将所述医疗信息集合输入预设Bilstm模型中进行向量计算,得到所述疾病的医学特征向量,并对所述医学特征向量进行池化分析,得到医学前馈向量; Pooling module 604 is used to input the medical information set into the preset Bilstm model for vector calculation to obtain the medical feature vector of the disease, and perform pooling analysis on the medical feature vector to obtain a medical feedforward vector;
聚类模块605,用于根据所述医学前馈向量,基于预设余弦相似度算法对所述医疗信息集合进行聚类分析,得到与所述疾病对应的医疗服务包。The clustering module 605 is configured to perform cluster analysis on the medical information set based on the medical feedforward vector and a preset cosine similarity algorithm to obtain a medical service package corresponding to the disease.
在本实施例中,所述基于病患数据的服务包生成装置还包括:In this embodiment, the device for generating service packages based on patient data further includes:
设置模块606,用于预先设置不同疾病类型的诊疗模板;The setting module 606 is used to pre-set diagnosis and treatment templates for different disease types;
获取模块607,用于获取所述治疗数据对应的时间信息;根据所述疾病类型,获取与所述治疗数据相应的诊疗模板;The acquisition module 607 is used to obtain the time information corresponding to the treatment data; according to the disease type, obtain the diagnosis and treatment template corresponding to the treatment data;
核验模块608,用于根据所述诊疗模板对所述治疗数据进行核验,判断所述治疗数据的时间顺序是否正确。The verification module 608 is used to verify the treatment data according to the diagnosis and treatment template, and determine whether the time sequence of the treatment data is correct.
在本实施例中,所述去标识模块602具体用于:In this embodiment, the de-identification module 602 is specifically used to:
根据所述原始病例数据构建治疗数据查询数据库;Construct a treatment data query database based on the original case data;
根据差分隐私算法,对所述治疗数据查询数据库的数据表中的敏感属性字段添加随机噪声;Add random noise to the sensitive attribute fields in the data table of the treatment data query database according to the differential privacy algorithm;
根据所述随机噪声,对所述原始病例数据中的敏感属性字段进行去标识处理,得到标识符字段;According to the random noise, perform de-identification processing on the sensitive attribute fields in the original case data to obtain an identifier field;
对所述标识符字段进行加密处理,得到目标治疗数据。The identifier field is encrypted to obtain target treatment data.
在本实施例中,所述融合模块603具体用于:In this embodiment, the fusion module 603 is specifically used to:
获取预定义的关键事件集合;Get a predefined key event collection;
对所述目标治疗数据进行筛选处理以过滤所述目标治疗数据中的无效医疗数据,得到有效医疗数据;Perform screening processing on the target treatment data to filter invalid medical data in the target treatment data to obtain valid medical data;
根据所述关键事件集合对所述有效医疗数据进行提取,得到所述目标治疗数据中的多个关键事件。The effective medical data is extracted according to the key event set to obtain multiple key events in the target treatment data.
在本实施例中,所述融合模块603具体还用于:In this embodiment, the fusion module 603 is also specifically used to:
确定所述关键事件对应的属性特征和所述属性特征的权重值;Determine the attribute characteristics corresponding to the key event and the weight value of the attribute characteristics;
根据所述属性特征以及所述权重值确定所述关键事件的得分数据,并根据所述得分数据得到与所述疾病类型对应的疾病的医疗信息集合。The score data of the key event is determined based on the attribute characteristics and the weight value, and a medical information set of diseases corresponding to the disease type is obtained based on the score data.
在本实施例中,所述池化模块604包括:In this embodiment, the pooling module 604 includes:
特征抽取单元6041,用于对所述医学特征向量进行特征抽取,得到所述医学特征向量的目标医学特征;The feature extraction unit 6041 is used to extract features from the medical feature vector to obtain the target medical features of the medical feature vector;
降维单元6042,用于对所述目标医学特征进行降维处理,得到预设维度的目标数据;The dimensionality reduction unit 6042 is used to perform dimensionality reduction processing on the target medical features to obtain target data of preset dimensions;
池化单元6043,用于对所述目标数据进行池化处理,得到所述目标治疗数据的医学前馈向量。The pooling unit 6043 is used to perform pooling processing on the target data to obtain the medical feedforward vector of the target treatment data.
在本实施例中,所述降维单元6042具体用于:In this embodiment, the dimensionality reduction unit 6042 is specifically used to:
获取所述目标医学特征的频率;Obtain the frequency of the target medical feature;
提取目标治疗数据中与所述目标医学特征对应的第一数据;extracting first data corresponding to the target medical feature in the target treatment data;
对不同类型的所述第一数据进行数据处理,得到标准数据;Perform data processing on the first data of different types to obtain standard data;
对所述标准数据进行降维处理,得到预设维度的目标数据。Perform dimensionality reduction processing on the standard data to obtain target data with preset dimensions.
本申请实施例中,通过从预设医疗信息平台中采集同类疾病的原始病例数据,并提取所述原始病例数据中的疾病类型和治疗数据;对所述治疗数据进行去标识化处理,得到目标治疗数据;提取所述目标治疗数据中的多个关键事件,并将所述关键事件进行融合处理,得到与所述疾病类型对应的疾病的医疗信息集合;将所述目标治疗数据输入预设Bilstm模型中,得到所述医疗数据的医学特征向量,并对所述医学特征向量进行池化分析,得到医学前馈向量;根据所述医学前馈向量,基于预设余弦相似度算法对所述医疗信息集合进行聚类分析,得到与所述疾病对应的医疗服务包。本申请通过对原始病例数据进行聚类分析得到具有共性特征的目标治疗数据,并根据目标治疗数据生成不同类型疾病对应服务包,解决了目前医疗服务缺少可持续化的跟踪服务,难以做到及时有效的反馈,对医疗数据信息缺乏有效集成,导致移动医疗服务质量无法保证的问题。有效提升医疗服务的效率和便携性,减少就医人员等待时间,缓解了医疗压力。In the embodiment of the present application, original case data of similar diseases are collected from a preset medical information platform, and the disease type and treatment data in the original case data are extracted; the treatment data is de-identified to obtain the target Treatment data; extract multiple key events in the target treatment data, and perform fusion processing on the key events to obtain a medical information set of diseases corresponding to the disease type; input the target treatment data into the preset Bilstm In the model, the medical feature vector of the medical data is obtained, and the medical feature vector is pooled and analyzed to obtain a medical feedforward vector; according to the medical feedforward vector, the medical feature vector is calculated based on a preset cosine similarity algorithm. The information collection is subjected to cluster analysis to obtain the medical service package corresponding to the disease. This application performs cluster analysis on original case data to obtain target treatment data with common characteristics, and generates service packages corresponding to different types of diseases based on the target treatment data, solving the problem of the lack of sustainable tracking services in current medical services and the difficulty in achieving timely The lack of effective feedback and effective integration of medical data information leads to the problem that the quality of mobile medical services cannot be guaranteed. Effectively improve the efficiency and portability of medical services, reduce waiting time for medical personnel, and relieve medical pressure.
上面图6和图7从模块化功能实体的角度对本申请实施例中的基于病患数据的服务包生成装置进行详细描述,下面从硬件处理的角度对本申请实施例中基于病患数据的服务包生成设备进行详细描述。The above Figure 6 and Figure 7 describe in detail the service package generation device based on patient data in the embodiment of the present application from the perspective of modular functional entities. The following is a detailed description of the service package based on patient data in the embodiment of the present application from the perspective of hardware processing. Generate a detailed description of the device.
图8是本申请实施例提供的基于病患数据的服务包生成设备的结构示意图,该基于病患数据的服务包生成设备800可因配置或性能不同而产生比较大的差异,可以包括一个或一个以上处理器(central processing units,CPU)810(例如,一个或一个以上处理器)和存储器820,一个或一个以上存储应用程序833或数据832的存储介质830(例如一个或一个以上海量存储设备)。其中,存储器820和存储介质830可以是短暂存储或持久存储。存储在存储介质830的程序可以包括一个或一个以上模块(图示没标出),每个模块可以包括对基于病患数据的服务包生成设备800中的一系列指令操作。更进一步地,处理器810可以设置为与存储介质830通信,在基于病患数据的服务包生成设备800上执行存储介质830中的一系列指令操作,以实现上述各方法实施例提供的基于病患数据的服务包生成方法的步骤。Figure 8 is a schematic structural diagram of a service package generation device based on patient data provided by an embodiment of the present application. The service package generation device 800 based on patient data may vary greatly due to different configurations or performance, and may include one or One or more central processing units (CPU) 810 (e.g., one or more processors) and memory 820, one or more storage media 830 (e.g., one or more mass storage devices) storing applications 833 or data 832 ). Among them, the memory 820 and the storage medium 830 may be short-term storage or persistent storage. The program stored in the storage medium 830 may include one or more modules (not shown in the figure), and each module may include a series of instruction operations in the service package generation device 800 based on patient data. Furthermore, the processor 810 may be configured to communicate with the storage medium 830 and execute a series of instruction operations in the storage medium 830 on the patient data-based service package generation device 800 to implement the disease-based service package provided by the above method embodiments. Steps of the service package generation method for suffering from data.
基于病患数据的服务包生成设备800还可以包括一个或一个以上电源840,一个或一个以上有线或无线网络接口850,一个或一个以上输入输出接口860,和/或,一个或一个以上操作 系统831,例如Windows Serve,Mac OS X,Unix,Linux,FreeBSD等等。本领域技术人员可以理解,图8示出的基于病患数据的服务包生成设备结构并不构成对本申请提供的基于病患数据的服务包生成设备的限定,可以包括比图示更多或更少的部件,或者组合某些部件,或者不同的部件布置。The service package generation device 800 based on patient data may also include one or more power supplies 840, one or more wired or wireless network interfaces 850, one or more input and output interfaces 860, and/or, one or more operating systems. 831, such as Windows Serve, Mac OS X, Unix, Linux, FreeBSD, etc. Those skilled in the art can understand that the structure of the service package generation device based on patient data shown in Figure 8 does not constitute a limitation on the service package generation device based on patient data provided in this application, and may include more or more features than those shown in the figure. Fewer parts, or combinations of certain parts, or different parts arrangements.
本申请还提供一种计算机可读存储介质,该计算机可读存储介质可以为非易失性计算机可读存储介质,该计算机可读存储介质也可以为易失性计算机可读存储介质,所述计算机可读存储介质中存储有指令,当所述指令在计算机上运行时,使得计算机执行上述基于病患数据的服务包生成方法的步骤。This application also provides a computer-readable storage medium. The computer-readable storage medium can be a non-volatile computer-readable storage medium. The computer-readable storage medium can also be a volatile computer-readable storage medium. Instructions are stored in the computer-readable storage medium. When the instructions are run on the computer, the computer is caused to execute the steps of the above-mentioned service package generation method based on patient data.
所述领域的技术人员可以清楚地了解到,为描述的方便和简洁,上述描述的系统,装置和单元的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。Those skilled in the field can clearly understand that for the convenience and simplicity of description, the specific working processes of the above-described systems, devices and units can be referred to the corresponding processes in the foregoing method embodiments, and will not be described again here.
所述集成的单元如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的全部或部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本申请各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(read-only memory,ROM)、随机存取存储器(random access memory,RAM)、磁碟或者光盘等各种可以存储程序代码的介质。If the integrated unit is implemented in the form of a software functional unit and sold or used as an independent product, it may be stored in a computer-readable storage medium. Based on this understanding, the technical solution of the present application is essentially or contributes to the existing technology, or all or part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a storage medium , including several instructions to cause a computer device (which may be a personal computer, a server, or a network device, etc.) to execute all or part of the steps of the methods described in various embodiments of this application. The aforementioned storage media include: U disk, mobile hard disk, read-only memory (ROM), random access memory (RAM), magnetic disk or optical disk and other media that can store program code. .
以上所述,以上实施例仅用以说明本申请的技术方案,而非对其限制;尽管参照前述实施例对本申请进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本申请各实施例技术方案的精神和范围。As mentioned above, the above embodiments are only used to illustrate the technical solution of the present application, but not to limit it. Although the present application has been described in detail with reference to the foregoing embodiments, those of ordinary skill in the art should understand that they can still make the foregoing technical solutions. The technical solutions described in each embodiment may be modified, or some of the technical features may be equivalently replaced; however, these modifications or substitutions do not cause the essence of the corresponding technical solutions to deviate from the spirit and scope of the technical solutions in each embodiment of the present application.
Claims (20)
- 一种基于病患数据的服务包生成方法,其中,所述基于病患数据的服务包生成方法包括:A method of generating service packages based on patient data, wherein the method of generating service packages based on patient data includes:从预设医疗信息平台中采集同类疾病的原始病例数据,并提取所述原始病例数据中的疾病类型和治疗数据;Collect original case data of similar diseases from the preset medical information platform, and extract the disease type and treatment data in the original case data;对所述治疗数据进行去标识化处理,得到目标治疗数据;De-identify the treatment data to obtain target treatment data;提取所述目标治疗数据中的多个关键事件,并将所述关键事件进行融合处理,得到与所述疾病类型对应的疾病的医疗信息集合;Extract multiple key events in the target treatment data, and perform fusion processing on the key events to obtain a medical information set of diseases corresponding to the disease type;将所述医疗信息集合输入预设Bilstm模型中进行向量计算,得到所述疾病的医学特征向量,并对所述医学特征向量进行池化分析,得到医学前馈向量;Input the medical information set into the preset Bilstm model for vector calculation to obtain the medical feature vector of the disease, and perform pooling analysis on the medical feature vector to obtain a medical feedforward vector;根据所述医学前馈向量,基于预设余弦相似度算法对所述医疗信息集合进行聚类分析,得到与所述疾病对应的医疗服务包。According to the medical feedforward vector, cluster analysis is performed on the medical information set based on a preset cosine similarity algorithm to obtain a medical service package corresponding to the disease.
- 根据权利要求1所述的基于病患数据的服务包生成方法,其中,在所述从预设医疗信息平台中采集同类疾病的原始病例数据,并提取所述原始病例数据中的疾病类型和治疗数据之后,还包括:The method for generating service packages based on patient data according to claim 1, wherein original case data of similar diseases are collected from the preset medical information platform, and disease types and treatments in the original case data are extracted. After the data, it also includes:预先设置不同疾病类型的诊疗模板;Preset diagnosis and treatment templates for different disease types;获取所述治疗数据对应的时间信息;Obtain time information corresponding to the treatment data;根据所述疾病类型,获取与所述治疗数据相应的诊疗模板;According to the disease type, obtain a diagnosis and treatment template corresponding to the treatment data;根据所述诊疗模板对所述治疗数据进行核验,判断所述治疗数据的时间顺序是否正确。The treatment data is verified according to the diagnosis and treatment template to determine whether the time sequence of the treatment data is correct.
- 根据权利要求1所述的基于病患数据的服务包生成方法,其中,所述对所述治疗数据进行去标识化处理,得到目标治疗数据包括:The method for generating service packages based on patient data according to claim 1, wherein de-identifying the treatment data to obtain target treatment data includes:根据所述原始病例数据构建治疗数据查询数据库;Construct a treatment data query database based on the original case data;根据差分隐私算法,对所述治疗数据查询数据库的数据表中的敏感属性字段添加随机噪声;Add random noise to the sensitive attribute fields in the data table of the treatment data query database according to the differential privacy algorithm;根据所述随机噪声,对所述原始病例数据中的敏感属性字段进行去标识处理,得到标识符字段;According to the random noise, perform de-identification processing on the sensitive attribute fields in the original case data to obtain an identifier field;对所述标识符字段进行加密处理,得到目标治疗数据。The identifier field is encrypted to obtain target treatment data.
- 根据权利要求1所述的基于病患数据的服务包生成方法,其中,所述提取所述目标治疗数据中的多个关键事件包括:The method for generating service packages based on patient data according to claim 1, wherein the extracting a plurality of key events in the target treatment data includes:获取预定义的关键事件集合;Get a predefined key event collection;对所述目标治疗数据进行筛选处理以过滤所述目标治疗数据中的无效医疗数据,得到有效医疗数据;Perform screening processing on the target treatment data to filter invalid medical data in the target treatment data to obtain valid medical data;根据所述关键事件集合对所述有效医疗数据进行提取,得到所述目标治疗数据中的多个关键事件。The effective medical data is extracted according to the key event set to obtain multiple key events in the target treatment data.
- 根据权利要求1所述的基于病患数据的服务包生成方法,其中,所述将所述关键事件进行融合处理,得到与所述疾病类型对应的疾病的医疗信息集合包括:The method for generating service packages based on patient data according to claim 1, wherein said fusion processing of said key events to obtain a medical information set of diseases corresponding to said disease types includes:确定所述关键事件对应的属性特征和所述属性特征的权重值;Determine the attribute characteristics corresponding to the key event and the weight value of the attribute characteristics;根据所述属性特征以及所述权重值确定所述关键事件的得分数据,并根据所述得分数据得到与所述疾病类型对应的疾病的医疗信息集合。The score data of the key event is determined based on the attribute characteristics and the weight value, and a medical information set of diseases corresponding to the disease type is obtained based on the score data.
- 根据权利要求1所述的基于病患数据的服务包生成方法,其中,所述对所述医学特征向量进行池化分析,得到医学前馈向量包括:The method for generating service packages based on patient data according to claim 1, wherein said performing pooling analysis on the medical feature vector to obtain a medical feedforward vector includes:对所述医学特征向量进行特征抽取,得到所述医学特征向量的目标医学特征;Perform feature extraction on the medical feature vector to obtain the target medical features of the medical feature vector;对所述目标医学特征进行降维处理,得到预设维度的目标数据;Perform dimensionality reduction processing on the target medical features to obtain target data with preset dimensions;对所述目标数据进行池化处理,得到所述目标治疗数据的医学前馈向量。Pooling is performed on the target data to obtain a medical feedforward vector of the target treatment data.
- 根据权利要求6所述的基于病患数据的服务包生成方法,其中,所述对所述目标医学特征进行降维处理,得到预设维度的目标数据包括:The method for generating service packages based on patient data according to claim 6, wherein said performing dimensionality reduction processing on the target medical features to obtain target data of preset dimensions includes:获取所述目标医学特征的频率;Obtain the frequency of the target medical feature;提取目标治疗数据中与所述目标医学特征对应的第一数据;extracting first data corresponding to the target medical feature in the target treatment data;对不同类型的所述第一数据进行数据处理,得到标准数据;Perform data processing on the first data of different types to obtain standard data;对所述标准数据进行降维处理,得到预设维度的目标数据。Perform dimensionality reduction processing on the standard data to obtain target data with preset dimensions.
- 一种基于病患数据的服务包生成装置,其中,所述基于病患数据的服务包生成装置包括:A device for generating a service package based on patient data, wherein the device for generating a service package based on patient data includes:采集模块,用于从预设医疗信息平台中采集同类疾病的原始病例数据,并提取所述原始病例数据中的疾病类型和治疗数据;The collection module is used to collect original case data of similar diseases from the preset medical information platform, and extract the disease type and treatment data in the original case data;去标识模块,用于对所述治疗数据进行去标识化处理,得到目标治疗数据;A de-identification module, used to de-identify the treatment data to obtain target treatment data;融合模块,用于提取所述目标治疗数据中的多个关键事件,并将所述关键事件进行融合处理,得到与所述疾病类型对应的疾病的医疗信息集合;A fusion module, used to extract multiple key events in the target treatment data and perform fusion processing on the key events to obtain a medical information set of diseases corresponding to the disease type;池化模块,用于将所述医疗信息集合输入预设Bilstm模型中进行向量计算,得到所述疾病的医学特征向量,并对所述医学特征向量进行池化分析,得到医学前馈向量;The pooling module is used to input the medical information set into the preset Bilstm model for vector calculation, obtain the medical feature vector of the disease, and perform pooling analysis on the medical feature vector to obtain the medical feedforward vector;聚类模块,用于根据所述医学前馈向量,基于预设余弦相似度算法对所述医疗信息集合进行聚类分析,得到与所述疾病对应的医疗服务包。A clustering module is configured to perform cluster analysis on the medical information set based on the medical feedforward vector and a preset cosine similarity algorithm to obtain a medical service package corresponding to the disease.
- 一种基于病患数据的服务包生成设备,其中,所述基于病患数据的服务包生成设备包括:存储器和至少一个处理器,所述存储器中存储有指令,所述存储器和所述至少一个处理器通过线路互连;A device for generating a service package based on patient data, wherein the device for generating a service package based on patient data includes: a memory and at least one processor, instructions are stored in the memory, and the memory and the at least one Processors are interconnected by wires;所述至少一个处理器调用所述存储器中的所述指令,以使得所述基于病患数据的服务包生成设备执行如下所述的基于病患数据的服务包生成方法的步骤:The at least one processor calls the instructions in the memory, so that the patient data-based service package generation device performs the steps of the patient data-based service package generation method as follows:从预设医疗信息平台中采集同类疾病的原始病例数据,并提取所述原始病例数据中的疾病类型和治疗数据;Collect original case data of similar diseases from the preset medical information platform, and extract the disease type and treatment data in the original case data;对所述治疗数据进行去标识化处理,得到目标治疗数据;De-identify the treatment data to obtain target treatment data;提取所述目标治疗数据中的多个关键事件,并将所述关键事件进行融合处理,得到与所述疾病类型对应的疾病的医疗信息集合;Extract multiple key events in the target treatment data, and perform fusion processing on the key events to obtain a medical information set of diseases corresponding to the disease type;将所述医疗信息集合输入预设Bilstm模型中进行向量计算,得到所述疾病的医学特征向量,并对所述医学特征向量进行池化分析,得到医学前馈向量;Input the medical information set into the preset Bilstm model for vector calculation to obtain the medical feature vector of the disease, and perform pooling analysis on the medical feature vector to obtain a medical feedforward vector;根据所述医学前馈向量,基于预设余弦相似度算法对所述医疗信息集合进行聚类分析,得到与所述疾病对应的医疗服务包。According to the medical feedforward vector, cluster analysis is performed on the medical information set based on a preset cosine similarity algorithm to obtain a medical service package corresponding to the disease.
- 根据权利要求9所述的基于病患数据的服务包生成设备,其中,所述基于病患数据的服务包生成程序被所述处理器执行实现在所述从预设医疗信息平台中采集同类疾病的原始病例数据,并提取所述原始病例数据中的疾病类型和治疗数据的步骤之后,还执行以下步骤:The device for generating service packages based on patient data according to claim 9, wherein the program for generating service packages based on patient data is executed by the processor to implement the collection of similar diseases from the preset medical information platform. After the steps of extracting the original case data and extracting the disease type and treatment data in the original case data, the following steps are also performed:预先设置不同疾病类型的诊疗模板;Preset diagnosis and treatment templates for different disease types;获取所述治疗数据对应的时间信息;Obtain time information corresponding to the treatment data;根据所述疾病类型,获取与所述治疗数据相应的诊疗模板;According to the disease type, obtain a diagnosis and treatment template corresponding to the treatment data;根据所述诊疗模板对所述治疗数据进行核验,判断所述治疗数据的时间顺序是否正确。The treatment data is verified according to the diagnosis and treatment template to determine whether the time sequence of the treatment data is correct.
- 根据权利要求9所述的基于病患数据的服务包生成设备,其中,所述基于病患数据的服务包生成程序被所述处理器执行实现所述对所述治疗数据进行去标识化处理,得到目标治疗数据的步骤时,还执行以下步骤:The device for generating a service package based on patient data according to claim 9, wherein the program for generating a service package based on patient data is executed by the processor to implement the de-identification processing of the treatment data, When obtaining target treatment data, the following steps are also performed:根据所述原始病例数据构建治疗数据查询数据库;Construct a treatment data query database based on the original case data;根据差分隐私算法,对所述治疗数据查询数据库的数据表中的敏感属性字段添加随机噪声;Add random noise to the sensitive attribute fields in the data table of the treatment data query database according to the differential privacy algorithm;根据所述随机噪声,对所述原始病例数据中的敏感属性字段进行去标识处理,得到标识符字段;According to the random noise, perform de-identification processing on the sensitive attribute fields in the original case data to obtain an identifier field;对所述标识符字段进行加密处理,得到目标治疗数据。The identifier field is encrypted to obtain target treatment data.
- 根据权利要求9所述的基于病患数据的服务包生成设备,其中,所述基于病患数据的服务包生成程序被所述处理器执行实现所述提取所述目标治疗数据中的多个关键事件的步骤时,还执行以下步骤:The device for generating a service package based on patient data according to claim 9, wherein the program for generating a service package based on patient data is executed by the processor to implement the extraction of a plurality of key points in the target treatment data. event steps, the following steps are also performed:获取预定义的关键事件集合;Get a predefined key event collection;对所述目标治疗数据进行筛选处理以过滤所述目标治疗数据中的无效医疗数据,得到有效 医疗数据;Perform screening processing on the target treatment data to filter invalid medical data in the target treatment data to obtain valid medical data;根据所述关键事件集合对所述有效医疗数据进行提取,得到所述目标治疗数据中的多个关键事件。The effective medical data is extracted according to the key event set to obtain multiple key events in the target treatment data.
- 根据权利要求9所述的基于病患数据的服务包生成设备,其中,所述基于病患数据的服务包生成程序被所述处理器执行实现所述将所述关键事件进行融合处理,得到与所述疾病类型对应的疾病的医疗信息集合的步骤时,还执行以下步骤:The device for generating a service package based on patient data according to claim 9, wherein the program for generating a service package based on patient data is executed by the processor to implement the fusion processing of the key events to obtain When collecting the medical information of the disease corresponding to the disease type, the following steps are also performed:确定所述关键事件对应的属性特征和所述属性特征的权重值;Determine the attribute characteristics corresponding to the key event and the weight value of the attribute characteristics;根据所述属性特征以及所述权重值确定所述关键事件的得分数据,并根据所述得分数据得到与所述疾病类型对应的疾病的医疗信息集合。The score data of the key event is determined based on the attribute characteristics and the weight value, and a medical information set of diseases corresponding to the disease type is obtained based on the score data.
- 根据权利要求9所述的基于病患数据的服务包生成设备,其中,所述基于病患数据的服务包生成程序被所述处理器执行实现所述对所述医学特征向量进行池化分析,得到医学前馈向量的步骤时,还执行以下步骤:The device for generating a service package based on patient data according to claim 9, wherein the program for generating a service package based on patient data is executed by the processor to implement the pooling analysis of the medical feature vector, When obtaining the steps of medical feedforward vector, the following steps are also performed:对所述医学特征向量进行特征抽取,得到所述医学特征向量的目标医学特征;Perform feature extraction on the medical feature vector to obtain the target medical features of the medical feature vector;对所述目标医学特征进行降维处理,得到预设维度的目标数据;Perform dimensionality reduction processing on the target medical features to obtain target data with preset dimensions;对所述目标数据进行池化处理,得到所述目标治疗数据的医学前馈向量。Pooling is performed on the target data to obtain a medical feedforward vector of the target treatment data.
- 一种计算机可读存储介质,所述计算机可读存储介质上存储有计算机程序,其中,所述计算机程序被处理器执行时实现如下所述的基于病患数据的服务包生成方法的步骤:A computer-readable storage medium, a computer program stored on the computer-readable storage medium, wherein when the computer program is executed by a processor, the steps of the method for generating a service package based on patient data are implemented as follows:从预设医疗信息平台中采集同类疾病的原始病例数据,并提取所述原始病例数据中的疾病类型和治疗数据;Collect original case data of similar diseases from the preset medical information platform, and extract the disease type and treatment data in the original case data;对所述治疗数据进行去标识化处理,得到目标治疗数据;De-identify the treatment data to obtain target treatment data;提取所述目标治疗数据中的多个关键事件,并将所述关键事件进行融合处理,得到与所述疾病类型对应的疾病的医疗信息集合;Extract multiple key events in the target treatment data, and perform fusion processing on the key events to obtain a medical information set of diseases corresponding to the disease type;将所述医疗信息集合输入预设Bilstm模型中进行向量计算,得到所述疾病的医学特征向量,并对所述医学特征向量进行池化分析,得到医学前馈向量;Input the medical information set into the preset Bilstm model for vector calculation to obtain the medical feature vector of the disease, and perform pooling analysis on the medical feature vector to obtain a medical feedforward vector;根据所述医学前馈向量,基于预设余弦相似度算法对所述医疗信息集合进行聚类分析,得到与所述疾病对应的医疗服务包。According to the medical feedforward vector, cluster analysis is performed on the medical information set based on a preset cosine similarity algorithm to obtain a medical service package corresponding to the disease.
- 根据权利要求15所述的计算机可读存储介质,其中,所述计算机程序被处理器执行在所述从预设医疗信息平台中采集同类疾病的原始病例数据,并提取所述原始病例数据中的疾病类型和治疗数据的步骤之后,还执行如下步骤:The computer-readable storage medium according to claim 15, wherein the computer program is executed by the processor to collect original case data of similar diseases from the preset medical information platform, and extract the original case data from the original case data. After the steps for disease type and treatment data, the following steps are also performed:预先设置不同疾病类型的诊疗模板;Preset diagnosis and treatment templates for different disease types;获取所述治疗数据对应的时间信息;Obtain time information corresponding to the treatment data;根据所述疾病类型,获取与所述治疗数据相应的诊疗模板;According to the disease type, obtain a diagnosis and treatment template corresponding to the treatment data;根据所述诊疗模板对所述治疗数据进行核验,判断所述治疗数据的时间顺序是否正确。The treatment data is verified according to the diagnosis and treatment template to determine whether the time sequence of the treatment data is correct.
- 根据权利要求15所述的计算机可读存储介质,其中,所述计算机程序被处理器执行所述对所述治疗数据进行去标识化处理,得到目标治疗数据的步骤时,还执行如下步骤:The computer-readable storage medium according to claim 15, wherein when the computer program is executed by the processor to de-identify the treatment data to obtain the target treatment data, the following steps are also performed:根据所述原始病例数据构建治疗数据查询数据库;Construct a treatment data query database based on the original case data;根据差分隐私算法,对所述治疗数据查询数据库的数据表中的敏感属性字段添加随机噪声;Add random noise to the sensitive attribute fields in the data table of the treatment data query database according to the differential privacy algorithm;根据所述随机噪声,对所述原始病例数据中的敏感属性字段进行去标识处理,得到标识符字段;According to the random noise, perform de-identification processing on the sensitive attribute fields in the original case data to obtain an identifier field;对所述标识符字段进行加密处理,得到目标治疗数据。The identifier field is encrypted to obtain target treatment data.
- 根据权利要求15所述的计算机可读存储介质,其中,所述计算机程序被处理器执行所述提取所述目标治疗数据中的多个关键事件的步骤时,还执行如下步骤:The computer-readable storage medium according to claim 15, wherein when the computer program is executed by the processor to extract a plurality of key events in the target treatment data, the following steps are also executed:获取预定义的关键事件集合;Get a predefined set of key events;对所述目标治疗数据进行筛选处理以过滤所述目标治疗数据中的无效医疗数据,得到有效医疗数据;Perform screening processing on the target treatment data to filter invalid medical data in the target treatment data to obtain valid medical data;根据所述关键事件集合对所述有效医疗数据进行提取,得到所述目标治疗数据中的多个关键事件。The effective medical data is extracted according to the key event set to obtain multiple key events in the target treatment data.
- 根据权利要求15所述的计算机可读存储介质,其中,所述计算机程序被处理器执行所述将所述关键事件进行融合处理,得到与所述疾病类型对应的疾病的医疗信息集合的步骤时,还执行如下步骤:The computer-readable storage medium according to claim 15, wherein when the computer program is executed by the processor, the step of fusing the key events to obtain a medical information set of diseases corresponding to the disease types is obtained. , also perform the following steps:确定所述关键事件对应的属性特征和所述属性特征的权重值;Determine the attribute characteristics corresponding to the key event and the weight value of the attribute characteristics;根据所述属性特征以及所述权重值确定所述关键事件的得分数据,并根据所述得分数据得到与所述疾病类型对应的疾病的医疗信息集合。The score data of the key event is determined based on the attribute characteristics and the weight value, and a medical information set of diseases corresponding to the disease type is obtained based on the score data.
- 根据权利要求15所述的计算机可读存储介质,其中,所述计算机程序被处理器执行所述对所述医学特征向量进行池化分析,得到医学前馈向量的步骤时,还执行如下步骤:The computer-readable storage medium according to claim 15, wherein when the computer program is executed by the processor to perform pooling analysis on the medical feature vector to obtain a medical feedforward vector, the following steps are also performed:对所述医学特征向量进行特征抽取,得到所述医学特征向量的目标医学特征;Perform feature extraction on the medical feature vector to obtain the target medical features of the medical feature vector;对所述目标医学特征进行降维处理,得到预设维度的目标数据;Perform dimensionality reduction processing on the target medical features to obtain target data with preset dimensions;对所述目标数据进行池化处理,得到所述目标治疗数据的医学前馈向量。Pooling is performed on the target data to obtain a medical feedforward vector of the target treatment data.
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210671458.4 | 2022-06-15 | ||
CN202210671458.4A CN115171830A (en) | 2022-06-15 | 2022-06-15 | Patient data-based service package generation method, device, equipment and storage medium |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2023240837A1 true WO2023240837A1 (en) | 2023-12-21 |
Family
ID=83486246
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2022/121728 WO2023240837A1 (en) | 2022-06-15 | 2022-09-27 | Service package generation method, apparatus and device based on patient data, and storage medium |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN115171830A (en) |
WO (1) | WO2023240837A1 (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN118609743A (en) * | 2024-08-07 | 2024-09-06 | 深圳明灏生物科技有限公司 | Medical data management method, system and storage medium based on artificial intelligence |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116631550B (en) * | 2023-07-26 | 2023-11-28 | 深圳爱递医药科技有限公司 | Data management and logic checking method for clinical trial and medical system thereof |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2016095684A (en) * | 2014-11-14 | 2016-05-26 | Kddi株式会社 | Prediction model construction device and program |
CN109360658A (en) * | 2018-11-01 | 2019-02-19 | 北京航空航天大学 | A kind of the disease pattern method for digging and device of word-based vector model |
US20200152320A1 (en) * | 2018-11-12 | 2020-05-14 | Roche Molecular Systems, Inc. | Medical treatment metric modelling based on machine learning |
CN113921122A (en) * | 2021-10-11 | 2022-01-11 | 杨孟帆 | Medical care resource distribution system based on intelligent medical treatment |
-
2022
- 2022-06-15 CN CN202210671458.4A patent/CN115171830A/en active Pending
- 2022-09-27 WO PCT/CN2022/121728 patent/WO2023240837A1/en unknown
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2016095684A (en) * | 2014-11-14 | 2016-05-26 | Kddi株式会社 | Prediction model construction device and program |
CN109360658A (en) * | 2018-11-01 | 2019-02-19 | 北京航空航天大学 | A kind of the disease pattern method for digging and device of word-based vector model |
US20200152320A1 (en) * | 2018-11-12 | 2020-05-14 | Roche Molecular Systems, Inc. | Medical treatment metric modelling based on machine learning |
CN113921122A (en) * | 2021-10-11 | 2022-01-11 | 杨孟帆 | Medical care resource distribution system based on intelligent medical treatment |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN118609743A (en) * | 2024-08-07 | 2024-09-06 | 深圳明灏生物科技有限公司 | Medical data management method, system and storage medium based on artificial intelligence |
Also Published As
Publication number | Publication date |
---|---|
CN115171830A (en) | 2022-10-11 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Hung et al. | Comparing deep neural network and other machine learning algorithms for stroke prediction in a large-scale population-based electronic medical claims database | |
WO2023240837A1 (en) | Service package generation method, apparatus and device based on patient data, and storage medium | |
CN112700838A (en) | Big data-based medication scheme recommendation method and device and related equipment | |
US20140025393A1 (en) | System and method for providing clinical decision support | |
US20150248537A1 (en) | Personalized Health Score Generator | |
US20130332194A1 (en) | Methods and systems for adaptive ehr data integration, query, analysis, reporting, and crowdsourced ehr application development | |
Lee et al. | Unlocking the potential of electronic health records for health research | |
US20210057106A1 (en) | System and Method for Digital Therapeutics Implementing a Digital Deep Layer Patient Profile | |
US20140122126A1 (en) | Clinical information processing | |
US11715569B2 (en) | Intent-based clustering of medical information | |
Awrahman et al. | A review of the role and challenges of big data in healthcare informatics and analytics | |
Gray et al. | Volume-outcome associations for parathyroid surgery in England: Analysis of an administrative data set for the Getting It Right First Time program | |
Raheja et al. | Data analysis and its importance in health care | |
Kumar et al. | Review paper on Big Data in healthcare informatics | |
Bertl et al. | Finding indicator diseases of psychiatric disorders in BigData using clustered association rule mining | |
Bogie D Phil et al. | Development of predictive informatics tool using electronic health records to inform personalized evidence-based pressure injury management for Veterans with spinal cord injury | |
CN116721730B (en) | Whole-course patient management system based on digital therapy | |
US20200234315A1 (en) | Systems and methods for patient retention in network through referral analytics | |
KR20180002229A (en) | An agent apparatus for constructing database for dementia information and the operating method by using the same | |
Chignell et al. | Nonconfidential patient types in emergency clinical decision support | |
Tyan et al. | Private equity acquisition of oncology clinics in the US from 2003 to 2022 | |
WO2014113730A1 (en) | Systems and methods for patient retention in network through referral analytics | |
Yee et al. | Big data: Its implications on healthcare and future steps | |
Alegría et al. | Performance metrics of substance use disorder care among Medicaid enrollees in New York, New York | |
Al-Shanableh et al. | Predicting the number of multiple chronic conditions in arizona state using data mining algorithms |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 22946509 Country of ref document: EP Kind code of ref document: A1 |