CN112562807A - Medical data analysis method, apparatus, device, storage medium, and program product - Google Patents

Medical data analysis method, apparatus, device, storage medium, and program product Download PDF

Info

Publication number
CN112562807A
CN112562807A CN202011441333.XA CN202011441333A CN112562807A CN 112562807 A CN112562807 A CN 112562807A CN 202011441333 A CN202011441333 A CN 202011441333A CN 112562807 A CN112562807 A CN 112562807A
Authority
CN
China
Prior art keywords
target object
medical
data
objects
target
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202011441333.XA
Other languages
Chinese (zh)
Other versions
CN112562807B (en
Inventor
王春宇
夏源
施振辉
黄海峰
陆超
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Baidu Netcom Science and Technology Co Ltd
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co Ltd filed Critical Beijing Baidu Netcom Science and Technology Co Ltd
Priority to CN202011441333.XA priority Critical patent/CN112562807B/en
Publication of CN112562807A publication Critical patent/CN112562807A/en
Application granted granted Critical
Publication of CN112562807B publication Critical patent/CN112562807B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H10/00ICT specially adapted for the handling or processing of patient-related medical or healthcare data
    • G16H10/60ICT specially adapted for the handling or processing of patient-related medical or healthcare data for patient-specific data, e.g. for electronic patient records
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/20ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for computer-aided diagnosis, e.g. based on medical expert systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Theoretical Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • Biomedical Technology (AREA)
  • Public Health (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Medical Informatics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • Primary Health Care (AREA)
  • Epidemiology (AREA)
  • Biophysics (AREA)
  • Computing Systems (AREA)
  • Molecular Biology (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Databases & Information Systems (AREA)
  • Pathology (AREA)
  • Measuring And Recording Apparatus For Diagnosis (AREA)
  • Medical Treatment And Welfare Office Work (AREA)

Abstract

The disclosure discloses an analysis method of medical data, relates to the field of deep learning, and particularly relates to the field of natural language processing. The analysis method of the medical data comprises the steps of extracting a plurality of target objects from the medical data; for each target object, selecting at least one reference object from a plurality of reference objects based on semantic similarity and medical term correlation between the target object and a preset plurality of reference objects, and associating the selected at least one reference object with the target object; and receiving data to be analyzed including at least one target object, and analyzing the data to be analyzed based on reference information of a reference object associated with the target object in the data to be analyzed. The present disclosure also discloses an analysis apparatus, a device, a storage medium, and a program product of medical data.

Description

Medical data analysis method, apparatus, device, storage medium, and program product
Technical Field
The present disclosure relates to the field of deep learning, specifically to the field of natural language processing, and more specifically to a method, an apparatus, a device, a storage medium, and a computer program product for analyzing medical data.
Background
Medical reports include a large number of medical terms and data indicators that require interpretation by a professional medical professional to convey the information in the medical report. With the rapid construction of medical informatization, the information construction of hospitals with electronic medical records as the core has become one of the important contents of new medical innovation, and therefore, the demand for automatically reading medical reports is increasingly urgent.
Disclosure of Invention
In view of the above, the present disclosure provides a method, an apparatus, a device, a storage medium and a computer program product for analyzing medical data.
According to a first aspect, there is provided a method of analysis of medical data, comprising:
extracting a plurality of target objects from the medical data;
for each target object, selecting at least one reference object from a plurality of preset reference objects based on semantic similarity and medical term correlation between the target object and the reference objects, and associating the selected at least one reference object with the target object; and
and receiving data to be analyzed containing at least one target object, and analyzing the data to be analyzed based on the reference information of the reference object associated with the target object in the data to be analyzed.
According to a second aspect, there is also provided an apparatus for analyzing medical data, comprising:
an extraction module for extracting a plurality of target objects from the medical data;
a comparison module, configured to select, for each target object, at least one reference object from a plurality of preset reference objects based on semantic similarity and medical term correlation between the target object and the reference objects, and associate the selected at least one reference object with the target object; and
and the analysis module is used for receiving data to be analyzed containing at least one target object and analyzing the data to be analyzed based on the reference information of the reference object associated with the target object in the data to be analyzed.
According to a third aspect, there is also provided an electronic device comprising:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method according to the first aspect.
According to a fourth aspect, there is also provided a non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method according to the first aspect.
According to a fifth aspect, there is also provided a computer program product comprising a computer program which, when executed by a processor, implements the method according to the first aspect.
According to the embodiment of the disclosure, by extracting a plurality of target objects from medical data and associating the target objects with the preset reference objects based on semantic similarity and medical term correlation, the automatic interpretation of the medical data is realized, the labor and time cost for reading medical reports is reduced, and the accuracy of interpretation is improved.
It should be understood that the statements in this section do not necessarily identify key or critical features of the embodiments of the present disclosure, nor do they limit the scope of the present disclosure. Other features of the present disclosure will become apparent from the following description.
Drawings
The drawings are included to provide a better understanding of the present solution and are not to be construed as limiting the present disclosure. Wherein:
fig. 1 is a flow chart of a method of analysis of medical data according to an embodiment of the present disclosure;
FIG. 2 is an example of medical data and a target object according to an embodiment of the present disclosure;
FIG. 3 is a schematic diagram of associating a target object with a reference object according to an embodiment of the present disclosure;
fig. 4 is an example architecture for analyzing medical data according to an embodiment of the present disclosure;
fig. 5 is a block diagram of an apparatus for analysis of medical data according to another embodiment of the present disclosure;
FIG. 6 is a block diagram of an electronic device that may be used to implement the method of analysis of medical data of an embodiment of the present disclosure.
Detailed Description
Exemplary embodiments of the present disclosure are described below with reference to the accompanying drawings, in which various details of the embodiments of the disclosure are included to assist understanding, and which are to be considered as merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.
Fig. 1 is a flow chart of a method 100 of analysis of medical data according to an embodiment of the present disclosure. As shown in fig. 1, a method 100 of analyzing medical data according to an embodiment of the present disclosure includes the steps of:
in step S110, a plurality of target objects are extracted from the medical data.
In step S120, for each target object, at least one reference object is selected from a plurality of reference objects based on semantic similarity and medical term correlation between the target object and a plurality of preset reference objects, and the selected at least one reference object is associated with the target object.
In step S130, data to be analyzed including at least one target object is received, and the data to be analyzed is analyzed based on reference information of a reference object associated with the target object in the data to be analyzed.
In particular, the medical data includes various medical reports, also referred to as medical examination reports, including at least one of medical examination reports and medical examination reports. The medical examination report includes, for example, blood routine, urine routine, etc., and the medical examination report includes, for example, X-ray, B-ultrasonic, etc. Analyzing the medical data may include interpreting medical reports that involve various different data analyses and processing, such as medication recommendation functions, controls for drug entities, diagnosis reason recommendation functions, and the like. Controls for some of the highlighted elements may also be involved, such as data analysis needed to diagnose disease, symptoms, signs, etc. In implementing the automatic interpretation process of the medical report, the medical reports of different hospitals can generally include different examination types, examination values or examination result indexes, and the like. According to an embodiment, the information may be extracted from the mass medical data, and an object (also referred to as an entity) may be generated based on the information according to a preset structure, and the structure of the object may be set as needed, for example, one examination type and one examination item under the examination type may be set as one object (also referred to as an entity). According to an embodiment, an object extracted from medical data is referred to as a target object, and a plurality of objects (referred to as reference objects) as a comparison standard may be preset for the medical data. The medical data may be analyzed based on a reference object as a control standard. In an embodiment of the present disclosure, in step S110, medical data within a set period of time from a hospital may be acquired. For example, all medical data for a hospital over a period of one or more years may be acquired. According to an embodiment, the medical data may include, but is not limited to, medical reports.
According to an embodiment, the plurality of reference objects as control criteria may include the same expression as the target object or may include a different expression from the target object. For example, in the case where the target object extracted from the medical data is "orthopedics", the preset reference object may be the same expression, such as "orthopedics", or a different expression, such as "bone surgery". For another example, in the case where the target object extracted from the medical data is an "outpatient service in the intestine", the preset reference object may be the same expression, for example, an "outpatient service in the intestine", or may be a different expression, for example, an "outpatient service in the intestinal department". The preset reference object may be obtained by training the model based on sample data.
According to an embodiment, in step S120, a correspondence between the target object and the reference object may be determined based on semantic similarity and medical term correlation between the target object and the reference object. For example, medically, "orthopedics" and "orthopedic" are expressed with the same meaning, and in embodiments of the present disclosure, the correspondence between "orthopedics" and "orthopedic" may be determined based on semantic similarity and medical term correlation between the target object and the reference object. As another example, "back pain", "back dull pain" and "back pain" can be determined to be medical expressions having the same meaning based on semantic similarity and medical term correlation between the target object and the reference object. That is, in the embodiment of the present disclosure, for each extracted target object, at least one reference object may be selected from preset reference objects to be associated with the target object.
Through steps S110 and S120, reference objects associated with all medical data in a set time period may be selected from preset reference objects, thereby constructing an application scenario that conforms to the actual situation of a hospital. Next, other medical data to be analyzed at the hospital may be analyzed based on the selected associated reference object. According to an embodiment, reference information of a reference object may be stored together with a preset reference object. In step S130, the data to be analyzed including the at least one target object may be a part of all medical data within a set time period for constructing an application scene conforming to the actual condition of the hospital, for example, one or more of a plurality of medical reports. Alternatively, the data to be analyzed may be medical data of other periods, such as one or more newly generated medical reports in addition to the plurality of medical reports described above. After obtaining the corresponding relationship between the target object and the reference object, the reference information of the reference object associated with the target object may be used to analyze the data to be analyzed. For example, if the target object in the medical report is < ophthalmology, left eye vision, and right eye vision >, and the determined associated reference object is < ophthalmology, left eye corrected vision, and right eye corrected vision >, the storing of the reference information of the reference object together with the associated reference object includes: if the corrected vision is equal to 1.0, the vision is considered as the standard vision; if the corrected vision is between 0.8 and 1.0, the vision is considered normal; if the corrected vision is less than 0.8 or greater than 1.2, the vision is considered under-corrected or over-corrected. Target objects in the medical report may be analyzed based on this information.
According to the embodiment of the disclosure, the application scene which accords with the actual condition of the hospital can be set according to the actual medical data of the hospital, and the target object can be associated to at least one reference object based on the semantic similarity and the medical term correlation between the target object and the preset reference object, so that the automatic interpretation of the medical data contained in the medical data is realized, the labor and material cost for interpreting the medical report is reduced, and the medical automation level of the hospital is improved.
Fig. 2 is an example of medical data and a target object according to an embodiment of the present disclosure. According to an embodiment, the medical data may include, but is not limited to, medical reports. As shown in fig. 2, is a segment of a medical report. The medical report includes an examination type and at least one examination item under the examination type. As shown in fig. 2, blood routine is an example of the type of examination, and red blood cell count (RBC), Hemoglobin (HGB), and the like are examples of examination items. According to an embodiment, a target object may be constructed based on an examination type in a medical report and at least one examination item under the examination type. As shown in fig. 2, the target object can be constructed such as < blood routine, red blood cell count (RBC) >, < blood routine, Hemoglobin (HGB) > or < blood routine, red blood cell count (RBC), Hemoglobin (HGB) > or the like. As shown in fig. 2, the medical report also includes the results and units corresponding to the examination items, and may be target information of the target object. As shown in fig. 2, the corresponding result and unit of the examination item "red blood cell count (RBC)" is "4.4210 ^ 12/L", and the target information of the target object < blood conventional, red blood cell count (RBC) > includes "4.4210 ^ 12/L".
Fig. 3 is a schematic diagram of associating a target object with a reference object according to an embodiment of the present disclosure. As shown in FIG. 3, the left side shows a plurality of medical reports F received1、F2、……、FN. Each medical report may include a plurality of target objects. For example, as shown in FIG. 3, in medical report F1Including examination type A1In checking type A1Also included under the item a is an inspection item1、a2. In medical report F2Including examination type A1And A2In checking type A1The following includes an examination item a1、a2In checking type A2The following includes an examination item a3. In medical report F3Including examination type A3In checking type A3The following includes an examination item a4、a5And a6. As shown in fig. 3, the right side represents a stored list of preset reference objects. Based on semantic similarity and medical term correlation between the target object and a preset plurality of reference objects, a medical report F can be determined1Target object in (1)<A1,a1>Can be associated with reference objects in the stored list<A’1,a’1>Medical report F1Target object in (1)<A1,a2>Can be associated with reference objects in the stored list<A’1,a’2>. Can determine the medical report F2Target object in (1)<A2,a3>Can be associated with reference objects in the stored list<A’2,a’3>Medical report F2Target object in (1)<A1,a1>And<A1,a2>and medical report F1Are identical and are therefore associated with the same reference pairSuch as a mouse. Medical report F3Target object in (1)<A3,a4>Can be associated with reference objects in the stored list<A’3,a’4>. The medical report F can be reported based on the above-described manner1、F2、……、FMThe target object included in each medical report (M is a natural number equal to or greater than 1) is associated with the reference object in the stored list, and then the target objects included in other medical data can be matched and analyzed within the range of the associated plurality of reference objects.
According to an embodiment, in selecting at least one reference object from a plurality of reference objects based on semantic similarity and medical term correlation between the target object and a plurality of preset reference objects, a first neural network model may be used to select N reference objects, of the plurality of preset reference objects, whose semantic similarity to the target object meets a preset condition, and a second neural network model may be used to select K reference objects, of the N selected reference objects, whose medical terms are medical synonyms of medical terms related to the target object, where N and K are integers, and 1 ≦ K < N.
According to an embodiment, selecting, from a plurality of preset reference objects, N reference objects having semantic similarity with a target object that meets a preset condition using a first neural network model includes: calculating semantic similarity of the target object and each of the plurality of reference objects using the first neural network model, and dividing the plurality of reference objects into a plurality of confidence intervals based on the calculated semantic similarity, and selecting N reference objects having the highest semantic similarity with the target object among the confidence intervals designated among the plurality of confidence intervals. According to an embodiment, the first neural network model may be a semantic similarity model and the second neural network model may be a medical domain synonym model.
For example, the high confidence interval may be a range in which the degree of similarity between the target object and the reference object is greater than or equal to 0.99. The accuracy of the similarity in this interval can be considered to be 100%. The low confidence interval may be a range in which the degree of similarity between the target object and the reference object is greater than or equal to 0.9 and less than 0.99. The similarity falling within this interval is also true for the vast majority. The unknown confidence interval may be a range with a similarity of less than 0.9. The similarity within the interval is generally considered to be low (statistically, the matching results with the similarity falling in the interval belong to a few). Also, although some matching results belong to the section, the mapping relationship is not necessarily erroneous. For example, < outpatient department of cataract, ophthalmology, 0.748952313>, is due to the fact that the standard department list is not covered, and thus, although the similarity is less than 0.9, it is also added to the upper or lower level and alias list of the target object. According to an embodiment, a set of candidate reference objects may correspond to each target object through the first neural network model. According to the embodiment, the N results with the highest ranking are reserved and used as the input of a subsequent second neural network model, the processing based on the first neural network model is used as a coarse ranking process, a large part of unreasonable candidates are filtered out for the subsequent steps, and a lot of time overhead can be saved.
Further, according to the embodiment, before selecting at least one reference object from the plurality of reference objects based on semantic similarity and medical term correlation between the target object and a plurality of preset reference objects for each target object, the occurrence frequency of the extracted plurality of target objects in the medical data may be further determined, and the extracted plurality of target objects may be filtered based on the occurrence frequency. Therefore, a large part of target objects which are not frequently appeared or are not important can be filtered in advance, and the time expense is further saved.
According to an embodiment, the third neural network model may be used to analyze the data to be analyzed. According to an embodiment, target information of each target object in the data to be analyzed may be determined using the third neural network model, reference information of a reference object associated with each target object in the data to be analyzed may be queried, and the target information of each target object may be evaluated based on the reference information. According to the embodiment, comprehensive evaluation may be performed based on the evaluation result of evaluating each target object in the data to be analyzed.
In embodiments of the present disclosure, an analytical interpretation for a single index is provided. In the analysis and interpretation aiming at the single index, firstly, according to a medical report transmitted from a hospital, the examination type, the examination item, the value of the examination item, the index of the examination result and the like, such as negative/positive, higher/lower and the like, in the medical report are analyzed through natural language understanding, the preset system examination item can be accurately triggered through the mapping relation between the target object and the reference object acquired by the first neural network model and the second neural network model, and the interpretation of the corresponding examination item can be triggered by combining the abnormal examination index and the medical operation corresponding to the examination item in the knowledge graph.
As the interpretation of the single index of the examination result has the characteristic of longer interpretation operation, although the requirement of part of hospitals can be met, in order to realize the experience more in line with clinical application, the examination interpretation function is upgraded, and multiple composite judgments are realized. In an embodiment of the present disclosure, a composite analytical interpretation of multiple items for multiple metrics is provided. In the composite analysis interpretation aiming at a plurality of items of the plurality of indexes, the corresponding analysis result can be recommended according to the combination of the plurality of abnormal examination indexes. For example, the examination type, examination items, values of the examination items, examination result indicators, etc., such as positive/negative, +/-, +/- + + +, values 1 to 2 (units), values greater than/less than/equal to/less than or equal to a certain value (unit), plain text expressions (e.g., red, white), etc., can be analyzed through natural language understanding according to medical reports transmitted from hospitals. And recommending a corresponding analysis result by combining the knowledge graph and realizing a relevant strategy according to the analyzed combination of the abnormal inspection indexes.
In embodiments of the present disclosure, interpretations for different ranges of the same examination anomaly interval are also provided. According to the embodiment, by analyzing the abnormal examination results in the medical report, the situation that the abnormal examination results have finer granularity exists is found, and different ranges have different clinical meanings in the same examination interval. For example, alkali residual (BE), the BE value of a normal person fluctuates around 0. The reference interval of arterial blood is-3- +3mmol/L, if BE positive value increases, metabolic alkalosis is often suggested; metabolic acidosis is often indicated if negative values of BE increase. In the embodiment of the disclosure, relevant strategies can be implemented in combination with the knowledge graph aiming at the situation, and the situation can be analyzed and interpreted. In embodiments of the present disclosure, a deviation of the target information for each target object from the reference information may be determined and a risk cue generated based on the deviation.
Fig. 4 is an example architecture for analyzing medical data according to an embodiment of the present disclosure. As shown in fig. 4, this example architecture may be used to implement the analysis method of medical data of the foregoing embodiments. The architecture shown in fig. 4 mainly includes two parts, one part is used for comparing the inspection items in various expression forms contained in the massive medical reports with a standard inspection item list supported by the system, so as to determine the standard inspection items matching the inspection items in various expression forms in the medical reports from the standard inspection item list, thereby obtaining a comparison table of the inspection items in various forms and the standard inspection items; another part is used for looking up reference information, such as reference ranges or reference values, of the examination items from the database according to the comparison table, so as to analyze the received medical report based on the reference information. The part of the framework for comparing the target object with the reference object comprises a multi-feature based semantic similarity model (a first neural network model) and a medical field synonym based model (a second neural network model), and the part for analyzing the medical report comprises a medical data analysis model (a third neural network model).
As shown in FIG. 4, a multi-feature based semantic similarity model may be used in the coarse ranking process. The training process of the multi-feature based semantic similarity model includes acquiring a plurality of first samples, the first samples including a target object, a reference object and a similarity between the target object and the reference object, and training the semantic similarity model (a first neural network model) using the acquired plurality of first samples.
The first sample may be in the form of a pair of object entities, one entity of the pair of object entities being a target object and the other entity being a reference object, in addition to which the first sample comprises a similarity between the two entities. According to the embodiment, a plurality of kinds of feature extraction can be respectively carried out on sample data (such as a plurality of medical examination reports), a plurality of kinds of feature vectors are obtained, and a feature matrix comprising a plurality of first samples is generated based on the plurality of kinds of feature vectors. In a particular embodiment, positive and negative samples are constructed in a ratio based on existing evidence-based pairs of medical subject entities (e.g., disease, surgery, symptom, department name, etc.) and unrelated pairs of entities. For example, orthopedics and bone surgery, and intestinal department and outpatient clinics, coronary heart disease, and coronary atherosclerotic heart disease. And adopting a plurality of word vector integration modes according to the generated object entity pairs. For example, in a specific embodiment, the three feature vector acquisition modes adopted are Fasttext feature extraction, Jaccard feature extraction, and ELMo feature extraction, respectively. And generating corresponding feature matrixes according to the three types of feature vectors, and training a machine learning model.
As shown in fig. 4, a medical domain-based synonym model may be used for the fine ordering process. The medical-domain synonym-model-based training process includes acquiring a plurality of second samples, wherein at least one of the plurality of second samples includes the medical term and a synonym of the medical term, and at least another one of the plurality of second samples includes the medical term and a non-synonym of the medical term, and training the medical-domain synonym-based model (a second neural network model) using the acquired plurality of second samples.
The second sample may also be in the form of a pair of medical synonym entities. In the construction process of the medical near-synonym entity pair, a part of diseases and aliases thereof are obtained as the candidate pair of the near-synonym entity based on the alias attributes in the multi-source structured medical corpus. According to an embodiment, a medical entity synonym pair is first mined from a clinical category of medical books and a multi-source medical corpus. In a specific embodiment, three approaches may be mainly adopted: 1) the method can acquire the medical entity synonym pairs such as snoring, obstructive sleep apnea and hypopnea syndrome, diffuse compact bone disease, systemic fragile sclerosis and the like according to the 'alias' attribute in the multi-source medical corpus. 2) Through the established rule template, for example, XXX is abbreviated as | commonly known | and translated | into an abbreviation of | … …, etc. 3) Candidate pairs of synonyms such as "back pain" and "back dull pain", "abdominal pain" and "persistent abdominal pain" are constructed synthetically from an existing list of medical entities and corresponding dimensional attributes such as frequency, intensity, color, duration, location, etc. Because some entities which do not accord with the grammar rule can be obtained by the method, the synonym pairs which accord with the grammar rule are filtered and screened as part of the candidate pairs of the synonym entities by the verification method based on the characteristics. According to an embodiment, an object entity may be constructed based on at least one of the following: a jaccard distance feature, a cosine distance feature, a jaro-wrinkler similarity, a text edit distance, etc. of the two object entities. And constructing object entity pairs according to the methods, generating corresponding characteristics, and screening according to the number of each characteristic reaching a threshold value.
In some embodiments, the system may further include a near word sentence recall model, or the function of a near word sentence recall model may be implemented by a medical domain near word model, which is used to compare the dialogs to be used for medical report interpretation with the interpretation dialogs supported by the system, to find suitable interpretation dialogs for various medical reports in the database. Medical near-word-sentence entity pairs may be pre-generated prior to training the near-word sentence recall model. In the process of generating a near meaning word entity-sentence pair, one entity in the near meaning word pair is retrieved through a large amount of clinical medicine linguistic data to obtain a sentence containing the entity, and a sample set used for training a near meaning word sentence recall module is constructed by combining the other entity. For the construction of the negative sample, firstly, an entity pair with a non-similar word relationship is constructed, then, the medical corpus is retrieved to obtain a sentence containing a target entity, and the negative sample is constructed by combining another entity. And generating a candidate pair set according to the obtained synonym entity-sentence. The synonym sentence recall module can be trained based on a leading edge natural language processing pre-training model.
And then, according to the open-source ERNIE pre-training language model, carrying out fine adjustment on the task of matching the medical field sentence pair by using the previously constructed near-synonym entity-sentence candidate pair set and the corresponding label (label) so as to finish the training of the medical field near-synonym model.
As shown in fig. 4, medical data may be interpreted using a trained multi-feature semantic similarity model and a medical domain synonym model, as well as a medical data analysis model. First, statistics may be performed according to medical data including target objects (e.g., test items) in a test report provided by a hospital, and a descending order may be performed according to the frequency of triggering of each test item. The significance of the statistics is that the comparison can be performed according to the customized requirements of different hospitals, if the hospital only needs to trigger the test items with high frequency, the test items with the trigger frequency larger than a preset value (for example, larger than 80% of the total frequency) after the statistics are compared with the test items (reference objects) supported by the system. And inputting a hospital examination item mapping list and a system examination item list, and mapping each examination item related to the hospital data with the examination items supported by the system. Firstly, calculating through a semantic similarity model to obtain the similarity between each inspection item and the system inspection item, dividing confidence intervals, performing descending order arrangement according to the similarity, obtaining top-ranked topN mappings to form a candidate mapping set, and regarding the process as a 'rough ordering' process. The method aims to obtain a relatively accurate mapping relation through preliminary screening and reduce the calculation amount of a subsequent near-synonym model and the overhead of a system. And taking the candidate set obtained by the coarse sorting as the input of the medical synonym model, further calculating to obtain a similarity result, further sorting, and obtaining topK results (for example, N > > K) with the highest sorting as a final comparison result.
After K control results are obtained for each test item, the medical data can be analyzed using the configured architecture. As shown in fig. 4, two modes of analysis result output are provided: a single-case comparison mode (shown as a single-index interpretation strategy of a check item) and a batch comparison mode (shown as a multi-item composite judgment strategy and interpretation of different ranges of the same abnormal inspection interval). The single-case comparison mode is compatible with a graphical interface test tool, convenience is provided for a user to detect the comparison relations of several entities, and the mapping result can be checked in the graphical interface only by initiating an HTTP request. The batch comparison mode is suitable for the comparison requirements of batch customization of different hospitals, and simultaneously supports the self-definition of various threshold values of each model.
As shown in fig. 4, in the case where the analysis result is output based on the single case control mode, risk tip information including, for example, high outcome, acute suppurative bacterial infection, granulocytic leukemia, sepsis, and the like may be output. In the case of outputting the analysis result based on the batch control mode, a prompt message including the risk of possibly suffering from primary aldosteronism, metabolic alkalosis, metabolic acidosis, etc. may be output.
Fig. 5 is a block diagram of an apparatus 500 for analyzing medical data according to another embodiment of the present disclosure. As shown in fig. 5, the medical data analysis apparatus 500 includes an extraction module 510, a comparison module 520, and an analysis module 530.
According to an embodiment, the extraction module 510 is configured to extract a plurality of target objects from the medical data. The comparison module 520 is configured to select, for each target object, at least one reference object from the plurality of reference objects based on semantic similarity and medical term correlation between the target object and a preset plurality of reference objects, and associate the selected at least one reference object with the target object. The analysis module 530 is configured to receive data to be analyzed including at least one target object, and analyze the data to be analyzed based on reference information of a reference object associated with the target object in the data to be analyzed.
The specific operations of the functional modules may be obtained by referring to the operation steps of the medical data analysis method 100 in the foregoing embodiment, and are not described herein again.
The present disclosure also provides an electronic device, a readable storage medium, and a computer program product according to embodiments of the present disclosure.
Fig. 6 is a block diagram of an electronic device 600 that may be used to implement the method of analysis of medical data of an embodiment of the present disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of the disclosure described and/or claimed herein.
As shown in fig. 6, the apparatus 600 includes a computing unit 601, which can perform various appropriate actions and processes according to a computer program stored in a Read Only Memory (ROM)602 or a computer program loaded from a storage unit 608 into a Random Access Memory (RAM) 603. In the RAM 603, various programs and data required for the operation of the device 600 can also be stored. The calculation unit 601, the ROM 602, and the RAM 603 are connected to each other via a bus 604. An input/output (I/O) interface 605 is also connected to bus 604.
A number of components in the device 600 are connected to the I/O interface 605, including: an input unit 606 such as a keyboard, a mouse, or the like; an output unit 607 such as various types of displays, speakers, and the like; a storage unit 608, such as a magnetic disk, optical disk, or the like; and a communication unit 609 such as a network card, modem, wireless communication transceiver, etc. The communication unit 609 allows the device 600 to exchange information/data with other devices via a computer network such as the internet and/or various telecommunication networks.
The computing unit 601 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of the computing unit 601 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various dedicated Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, and so forth. The calculation unit 601 performs the respective methods and processes described above, such as the analysis method of medical data. For example, in some embodiments, the method of analyzing medical data may be implemented as a computer software program tangibly embodied in a machine-readable medium, such as storage unit 608. In some embodiments, part or all of the computer program may be loaded and/or installed onto the device 600 via the ROM 602 and/or the communication unit 609. When the computer program is loaded into the RAM 603 and executed by the computing unit 601, one or more steps of the method of analyzing medical data described above may be performed. Alternatively, in other embodiments, the computing unit 601 may be configured by any other suitable means (e.g. by means of firmware) to perform the method of analyzing the medical data.
Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuitry, Field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), system on a chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.
Program code for implementing the methods of the present disclosure may be written in any combination of one or more programming languages. These program codes may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the program codes, when executed by the processor or controller, cause the functions/operations specified in the flowchart and/or block diagram to be performed. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package partly on the machine and partly on a remote machine or entirely on the remote machine or server.
In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.
The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), Wide Area Networks (WANs), and the Internet.
The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other.
It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in the present disclosure may be executed in parallel, sequentially, or in different orders, as long as the desired results of the technical solutions disclosed in the present disclosure can be achieved, and the present disclosure is not limited herein.
The above detailed description should not be construed as limiting the scope of the disclosure. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made in accordance with design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present disclosure should be included in the scope of protection of the present disclosure.

Claims (17)

1. A method of analyzing medical data, comprising:
extracting a plurality of target objects from the medical data;
for each target object, selecting at least one reference object from a plurality of preset reference objects based on semantic similarity and medical term correlation between the target object and the reference objects, and associating the selected at least one reference object with the target object; and
and receiving data to be analyzed containing at least one target object, and analyzing the data to be analyzed based on the reference information of the reference object associated with the target object in the data to be analyzed.
2. The method of claim 1, wherein the selecting at least one reference object from a plurality of reference objects based on semantic similarity and medical term correlation between the target object and the preset reference objects comprises:
selecting N reference objects, the semantic similarity of which with the target object meets a preset condition, from the preset multiple reference objects by using a first neural network model; and
selecting K reference objects from the N reference objects using a second neural network model, the K reference objects referring to medical terms that are medical synonyms of medical terms referring to the target object, where N and K are both integers, 1 ≦ K < N.
3. The method of claim 2, wherein selecting, using the first neural network model, N reference objects from the preset plurality of reference objects whose semantic similarity to the target object meets a preset condition comprises:
calculating a semantic similarity of the target object to each of the plurality of reference objects using a first neural network model;
partitioning the plurality of reference objects into a plurality of confidence intervals based on the calculated semantic similarity;
and selecting N reference objects with the highest semantic similarity with the target object from the confidence intervals specified in the plurality of confidence intervals.
4. The method of claim 1, further comprising: before selecting, for each target object, at least one reference object from a plurality of reference objects based on semantic similarity and medical term correlation between the target object and the plurality of reference objects,
determining a frequency of occurrence of the extracted plurality of target objects in the medical data;
and screening the extracted plurality of target objects based on the occurrence frequency.
5. The method of claim 1, wherein the analyzing the data to be analyzed comprises using a third neural network model to:
determining target information of each target object in the data to be analyzed;
querying reference information of a reference object associated with each target object in the data to be analyzed;
evaluating the target information of each target object based on the reference information.
6. The method of claim 5, further comprising:
comprehensive evaluation is performed based on the evaluation result of evaluating each target object in the data to be analyzed.
7. The method of claim 6, further comprising:
determining a deviation of the target information of each of the target objects with respect to the reference information;
generating a risk cue based on the deviation.
8. The method of claim 2, further comprising:
obtaining a plurality of first samples, wherein the first samples comprise a target object, a reference object and a similarity between the target object and the reference object;
training the first neural network model using the plurality of first samples.
9. The method of claim 8, further comprising:
extracting various features of the sample data respectively to obtain various feature vectors; and
generating a feature matrix comprising the plurality of first samples based on the plurality of feature vectors.
10. The method of claim 9, wherein the plurality of feature extractions comprises at least two of: fastext feature extraction, Jaccard feature extraction, and ELMo feature extraction.
11. The method of claim 2, further comprising:
obtaining a plurality of second samples, at least one of the plurality of second samples comprising a medical term and a synonym of the medical term, at least another one of the plurality of second samples comprising a medical term and a non-synonym of the medical term; and
training the second neural network model using the plurality of second samples.
12. The method of any one of claims 1 to 11,
the medical data comprises a plurality of medical reports;
the target object comprises an examination type in the medical report and at least one examination item under the examination type, and the target information of the target object comprises a value of the at least one examination item;
the data to be analyzed includes one or more of the plurality of medical reports, or one or more medical reports other than the plurality of medical reports.
13. The method of claim 12, further comprising:
acquiring dialect information related to a target object;
selecting M reference objects from the N reference objects using a second neural network model, the M reference objects relating to tactical information that semantically matches the tactical information of the target object, wherein M is an integer, and 1 ≦ M < N.
14. An apparatus for analyzing medical data, comprising:
an extraction module for extracting a plurality of target objects from the medical data;
a comparison module, configured to select, for each target object, at least one reference object from a plurality of preset reference objects based on semantic similarity and medical term correlation between the target object and the reference objects, and associate the selected at least one reference object with the target object; and
and the analysis module is used for receiving data to be analyzed containing at least one target object and analyzing the data to be analyzed based on the reference information of the reference object associated with the target object in the data to be analyzed.
15. An electronic device, comprising:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1 to 13.
16. A non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method of any one of claims 1 to 13.
17. A computer program product comprising a computer program which, when executed by a processor, implements the method of any one of claims 1 to 13.
CN202011441333.XA 2020-12-11 2020-12-11 Medical data analysis method, apparatus, device, storage medium, and program product Active CN112562807B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011441333.XA CN112562807B (en) 2020-12-11 2020-12-11 Medical data analysis method, apparatus, device, storage medium, and program product

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011441333.XA CN112562807B (en) 2020-12-11 2020-12-11 Medical data analysis method, apparatus, device, storage medium, and program product

Publications (2)

Publication Number Publication Date
CN112562807A true CN112562807A (en) 2021-03-26
CN112562807B CN112562807B (en) 2024-03-12

Family

ID=75062193

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011441333.XA Active CN112562807B (en) 2020-12-11 2020-12-11 Medical data analysis method, apparatus, device, storage medium, and program product

Country Status (1)

Country Link
CN (1) CN112562807B (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113111625A (en) * 2021-04-30 2021-07-13 善诊(上海)信息技术有限公司 Medical text label generation system and method and computer readable storage medium
CN113257371A (en) * 2021-06-03 2021-08-13 中南大学 Clinical examination result analysis method and system based on medical knowledge map
CN113626688A (en) * 2021-07-21 2021-11-09 上海齐网网络科技有限公司 Intelligent medical data acquisition method and system based on software definition
CN114400062A (en) * 2021-12-21 2022-04-26 广州金域医学检验中心有限公司 Interpretation method and device of inspection report, computer equipment and storage medium
CN114912804A (en) * 2022-05-17 2022-08-16 四川大学华西医院 Scientific research data related property control method and system

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105095665A (en) * 2015-08-13 2015-11-25 易保互联医疗信息科技(北京)有限公司 Natural language processing method and system for Chinese disease diagnosis information
CN106383853A (en) * 2016-08-30 2017-02-08 刘勇 Realization method and system for electronic medical record post-structuring and auxiliary diagnosis
CN109684445A (en) * 2018-11-13 2019-04-26 中国科学院自动化研究所 Colloquial style medical treatment answering method and system

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105095665A (en) * 2015-08-13 2015-11-25 易保互联医疗信息科技(北京)有限公司 Natural language processing method and system for Chinese disease diagnosis information
CN106383853A (en) * 2016-08-30 2017-02-08 刘勇 Realization method and system for electronic medical record post-structuring and auxiliary diagnosis
CN109684445A (en) * 2018-11-13 2019-04-26 中国科学院自动化研究所 Colloquial style medical treatment answering method and system

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
范雪雪;王志荣;徐晤;梁银;马小虎;: "基于医学本体的术语相似度算法研究", 现代图书情报技术, no. 12 *

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113111625A (en) * 2021-04-30 2021-07-13 善诊(上海)信息技术有限公司 Medical text label generation system and method and computer readable storage medium
CN113257371A (en) * 2021-06-03 2021-08-13 中南大学 Clinical examination result analysis method and system based on medical knowledge map
CN113257371B (en) * 2021-06-03 2022-02-15 中南大学 Clinical examination result analysis method and system based on medical knowledge map
CN113626688A (en) * 2021-07-21 2021-11-09 上海齐网网络科技有限公司 Intelligent medical data acquisition method and system based on software definition
CN113626688B (en) * 2021-07-21 2023-09-01 上海齐网网络科技有限公司 Intelligent medical data acquisition method and system based on software definition
CN114400062A (en) * 2021-12-21 2022-04-26 广州金域医学检验中心有限公司 Interpretation method and device of inspection report, computer equipment and storage medium
CN114400062B (en) * 2021-12-21 2024-03-22 广州金域医学检验中心有限公司 Interpretation method and device of inspection report, computer equipment and storage medium
CN114912804A (en) * 2022-05-17 2022-08-16 四川大学华西医院 Scientific research data related property control method and system

Also Published As

Publication number Publication date
CN112562807B (en) 2024-03-12

Similar Documents

Publication Publication Date Title
CN112562807B (en) Medical data analysis method, apparatus, device, storage medium, and program product
Quan et al. Coding algorithms for defining comorbidities in ICD-9-CM and ICD-10 administrative data
US8214224B2 (en) Patient data mining for quality adherence
CN109670054B (en) Knowledge graph construction method and device, storage medium and electronic equipment
CN108027823B (en) Information processing device, information processing method, and computer-readable storage medium
US8145644B2 (en) Systems and methods for providing access to medical information
CN113345577B (en) Diagnosis and treatment auxiliary information generation method, model training method, device, equipment and storage medium
US20120101846A1 (en) Computer-Implemented Method For Displaying Patient-Related Diagnoses Of Chronic Illnesses
US8548827B2 (en) Computer-implemented method for medical diagnosis support
CN111382275A (en) Construction method, device and medium of medical knowledge graph and electronic equipment
US20200058408A1 (en) Systems, methods, and apparatus for linking family electronic medical records and prediction of medical conditions and health management
Ma et al. Using the shapes of clinical data trajectories to predict mortality in ICUs
CN112115697A (en) Method, device, server and storage medium for determining target text
JP2015018462A (en) Medical chart system and medical chart search method
Chandra et al. Natural language Processing and Ontology based Decision Support System for Diabetic Patients
CN116189857A (en) Triage grade determining method and device, electronic equipment and storage medium
CN113808758A (en) Method and device for verifying data standardization, electronic equipment and storage medium
CN111261298A (en) Medical data quality pre-judging method and device, readable medium and electronic equipment
CN115719640A (en) System, device, electronic equipment and storage medium for recognizing primary and secondary symptoms of traditional Chinese medicine
EP3230907B1 (en) System and method for uniformly correlating unstructured entry features to associated therapy features
CN114664421A (en) Doctor-patient matching method and device, electronic equipment, medium and product
CN114595322A (en) Insurance product recommendation method and device
Colak et al. Design, validation and performance of aspartate aminotransferase-and lactate dehydrogenase-reporting algorithms for haemolysed specimens including correction within quality specifications
CN112711579A (en) Medical data quality detection method and device, storage medium and electronic equipment
Shojaee-Mend et al. Prediction of Diabetes Using Data Mining and Machine Learning Algorithms: A Cross-Sectional Study

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant