CN111816321B - System, apparatus and storage medium for intelligent infectious disease identification based on legal diagnostic criteria - Google Patents

System, apparatus and storage medium for intelligent infectious disease identification based on legal diagnostic criteria Download PDF

Info

Publication number
CN111816321B
CN111816321B CN202010659267.7A CN202010659267A CN111816321B CN 111816321 B CN111816321 B CN 111816321B CN 202010659267 A CN202010659267 A CN 202010659267A CN 111816321 B CN111816321 B CN 111816321B
Authority
CN
China
Prior art keywords
case classification
infectious disease
feature
standard
information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010659267.7A
Other languages
Chinese (zh)
Other versions
CN111816321A (en
Inventor
杜乐
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Wuhan Donghu Big Data Technology Co ltd
Original Assignee
Wuhan Donghu Big Data Trading Center Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Wuhan Donghu Big Data Trading Center Co ltd filed Critical Wuhan Donghu Big Data Trading Center Co ltd
Priority to CN202010659267.7A priority Critical patent/CN111816321B/en
Publication of CN111816321A publication Critical patent/CN111816321A/en
Application granted granted Critical
Publication of CN111816321B publication Critical patent/CN111816321B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/80ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for detecting, monitoring or modelling epidemics or pandemics, e.g. flu
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/70ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for mining of medical data, e.g. analysing previous cases of other patients

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Public Health (AREA)
  • Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Databases & Information Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Pathology (AREA)
  • Biomedical Technology (AREA)
  • Primary Health Care (AREA)
  • General Health & Medical Sciences (AREA)
  • Epidemiology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Computational Linguistics (AREA)
  • Medical Treatment And Welfare Office Work (AREA)

Abstract

The invention discloses a system, equipment and storage medium for intelligently identifying infectious diseases based on legal diagnostic standard, wherein the system comprises: the system comprises an index construction module, an information extraction module, a standard database, a first text mining module, a second text mining module and a feature matching module; the invention determines the specific index of the infectious disease classification based on the legal diagnostic standard of infectious disease, and establishes a standard database by extracting the infectious disease case classification standard and the main characteristic information contained in each infectious disease case classification standard; the characteristic information vector space model is adopted to extract the infectious disease diagnosis standard characteristics, and intelligent identification is carried out through cosine similarity, mutual information similarity and combination of the cosine similarity and the mutual information similarity, so that accurate case identification, intelligent cognition and auxiliary diagnosis are realized.

Description

System, apparatus and storage medium for intelligent infectious disease identification based on legal diagnostic criteria
Technical Field
The invention relates to the technical field of infectious disease prevention and treatment, in particular to a system, equipment and a storage medium for intelligently identifying infectious diseases based on legal diagnostic standards.
Background
Infectious Diseases (Infectious Diseases) are a group of Diseases caused by various pathogens that can be transmitted from person to person, animal to person or animal to animal. The number of infectious diseases regulated by national statutory is increased from 37 to 39. Wherein, the A is 2 types, the B is 26 types, and the C is 11 types. Infectious diseases of class A including plague, cholera, etc.; the infectious diseases of type B include infectious atypical pneumonia, AIDS, viral hepatitis, etc.; the third class infectious disease comprises pulmonary tuberculosis, schistosomiasis, filariasis, echinococcosis, leprosy, influenza, epidemic parotitis, rubella, neonatal tetanus, acute hemorrhagic conjunctivitis, infectious diarrhea diseases except cholera, dysentery, typhoid fever and paratyphoid fever, etc.
The legal infectious disease diagnostic standard mainly refers to the infectious disease diagnostic standard (trial) catalog regulated and managed by the infectious disease prevention and treatment law of the people's republic of China. The traditional legal infectious disease identification method mainly comprises the steps of after certain symptom characteristic information of a patient is found or collected, according to a diagnosis standard published by the state, if certain characteristic information is found to be in line with or basically in line with the legal diagnosis standard, manual comparison and screening are carried out, time and labor are wasted, some characteristic information which is ambiguous and is similar to that of the legal diagnosis standard is sometimes determined or determined correctly, a large amount of information is lost, and therefore missed report and false report are caused. Therefore, an auxiliary diagnosis system for intelligently identifying infectious diseases based on legal diagnosis standards is needed to realize accurate case identification, intelligent cognition and auxiliary diagnosis.
Disclosure of Invention
In view of the above, the invention provides a system for intelligently identifying infectious diseases based on legal diagnostic standards, which is used for solving the problem that the traditional manual infectious disease comparison and screening can cause missed report and false report, and helping medical staff to perform infectious disease auxiliary diagnosis.
In a first aspect of the present invention, a system for intelligent identification of infectious diseases based on legal diagnostic criteria is presented, the system comprising:
an index construction module: the system is used for drawing and constructing the legal infectious disease case classification and the specific indexes of the diagnostic standard according to the legal infectious disease diagnostic standard;
the information extraction module: the method is used for extracting main characteristic information contained in each infectious disease case classification according to the legal infectious disease case classification and specific indexes of diagnostic standards;
a standard database: the standard database is used for establishing the incidence relation between the diagnosis standards of various infectious diseases and different case classification types of the same infectious disease and the corresponding main characteristic information;
a first text mining module: the system comprises a standard database, a first feature vector set, a second feature vector set and a third feature vector set, wherein the standard database is used for carrying out text mining on main feature information of the standard database, carrying out weight calculation and first core feature word extraction, and constructing a vector space model to obtain a first feature vector set corresponding to the main feature information of each infectious disease case classification;
the second text mining module: the system is used for constructing a feature selection model based on conditional mutual information, text mining is carried out on main feature information of the standard database by adopting a TF-IDF function, weight calculation is carried out according to the correlation degree between entries of the main feature information and case classification, a second core feature word is selected, a vector space model is constructed, and a second feature vector set corresponding to the main feature information of each infectious disease case classification is obtained;
a feature matching module: the cosine similarity between the text to be classified and the elements in the first feature vector set is calculated respectively; respectively calculating the mutual information correlation between the text to be classified and the second feature vector set element; and classifying cases of the texts to be classified according to the cosine similarity and the mutual information correlation.
Preferably, in the first text mining module, a TF-IDF function is used to calculate the entry weight of the main feature information:
let D be a set of documents comprising m documents, DiFor the feature vector of the ith document, there are: d ═ D1,D2,…,Dm},Di=(di1,di2,…,din) I is 1,2, …, m, wherein dijAs a document DiThe j-th entry tjThe weight value of (2):
Figure GDA0003594068060000021
where i is 1,2, …, m; j is 1,2, …, N is the total number of documents in the document database, NjIs that the document database contains an entry tjThe number of documents.
Preferably, in the feature matching module, the weight calculation is performed according to the correlation between the entry of the main feature information and the case classification, and the selecting of the second core feature word specifically includes:
calculating the mutual information correlation degree between each entry of the main characteristic information contained in the case classification and the case classification, wherein the formula is as follows:
Figure GDA0003594068060000031
wherein, A is the number of documents with the term t appearing in the case classification category c; b is the number of documents in which the term t appears in categories other than the case classification category c; c is the number of documents with no word bar t in the case classification category C; n is the sum of the number of documents in all categories; if the number of categories is m, each entry obtains m correlation values;
and taking the average value of the m values as the weight of each entry, sequencing the entries from low to high according to the word frequency, removing words which only appear in a single category and have the word frequency lower than a preset word frequency threshold, sequencing the rest entries from high to low according to the weight, and taking the words with the weight value higher than the preset weight threshold as second core characteristic words.
Preferably, in the feature matching module, the case classification of the text to be classified according to the cosine similarity and the mutual information correlation specifically includes:
and for each case classification category, taking the maximum value of the cosine similarity and the mutual information correlation as the output probability value of the corresponding case classification category, setting a first probability threshold, and taking the category with the probability value larger than the first probability threshold as the recognition recommendation result.
Preferably, in the feature matching module, the case classification of the text to be classified according to the cosine similarity and the mutual information correlation specifically includes:
and for each case classification category, taking the weighted sum of the cosine similarity and the correlation as the output probability value of the corresponding case classification category, setting a second probability threshold, and taking the category with the probability value larger than the second probability threshold as the recognition recommendation result.
In a second aspect of the present invention, an electronic device is disclosed, comprising: at least one processor, at least one memory, a communication interface, and a bus;
the processor, the memory and the communication interface complete mutual communication through the bus;
the memory stores program instructions executable by the processor which are invoked by the processor to implement the system according to the first aspect of the invention.
In a third aspect of the invention, a computer-readable storage medium is disclosed, which stores computer instructions for causing a computer to implement the system of the first aspect of the invention.
Compared with the prior art, the invention has the following beneficial effects:
1) the invention establishes a standard database of the incidence relation between the different infectious diseases and the different case classification type diagnostic standards of the same infectious disease and the corresponding main characteristic information based on the current legal diagnostic standard of the infectious diseases, the standard database provides a comprehensive standard characteristic information base for the different infectious diseases and the different case classification types of the same infectious disease, and provides a basis for the auxiliary diagnosis and accurate identification of the various infectious diseases;
2) based on the standard database, the vector space model is applied to feature extraction of infectious disease diagnosis standards, two important problems of type classification and feature information extraction can be effectively solved, the loss of feature information is greatly reduced, and the accuracy of intelligent identification and diagnosis is improved;
3) the invention respectively carries out intelligent identification through cosine similarity, mutual information similarity and combination thereof, further improves the diagnosis accuracy rate through a multi-aspect cross comparison mode, provides reliable auxiliary diagnosis results for medical personnel, and reduces missing reports and wrong reports.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
Fig. 1 is a schematic structural diagram of a system according to an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be obtained by a person skilled in the art without any inventive step based on the embodiments of the present invention, are within the scope of the present invention.
Referring to fig. 1, the present invention provides a system for intelligently identifying infectious diseases based on legal diagnostic criteria, the system comprising:
an index construction module: the system is used for drawing and constructing the legal infectious disease case classification and the specific indexes of the diagnostic standard according to the legal infectious disease diagnostic standard;
the information extraction module: the method is used for extracting main characteristic information contained in each infectious disease case classification according to the legal infectious disease case classification and specific indexes of diagnostic standards;
a standard database: the standard database is used for establishing the incidence relation between the diagnosis standards of various infectious diseases and different case classification types of the same infectious disease and the corresponding main characteristic information;
a first text mining module: text mining is carried out on the main characteristic information of the standard database, weight calculation and first core characteristic word extraction are carried out by adopting TF-IDF, a vector space model is constructed, and a first characteristic vector set corresponding to the main characteristic information of each infectious disease case classification is obtained;
the second text mining module: constructing a feature selection model based on conditional mutual information, performing text mining on main feature information of the standard database, performing weight calculation according to the correlation between entries of the main feature information and case classifications, selecting a second core feature word, constructing a vector space model, and obtaining a second feature vector set corresponding to the main feature information of each infectious disease case classification;
a feature matching module: respectively calculating cosine similarity between the text to be classified and elements in the first feature vector set; respectively calculating the mutual information correlation between the text to be classified and the second feature vector set element; and classifying cases of the texts to be classified according to the cosine similarity and the mutual information correlation.
Embodiments of the invention are further described below in connection with specific classes of infectious diseases.
And drawing and constructing specific indexes of the legal infectious disease case classification and diagnosis standard through an index construction module. The accurate classification of the legal infectious disease diagnosis standard case types determines whether the infectious disease identification system can quickly and accurately search the characteristics of various infectious diseases, thereby improving the matching speed. For example, using the standard of legal infectious atypical pneumonia (trial) diagnosis as an example: 1. history of epidemiology. Two points are noted here: 1.1 there is a history of close contact with the patient or belongs to one of the infected group patients or has evidence of clearly infecting others; 1.2 before onset: in two weeks, patients who had been or who lived in an area where infectious atypical pneumonia was reported and had developed a secondary infectious epidemic; 2. symptoms and signs: the traditional Chinese medicine composition is not in an acute onset, takes fever as the first symptom, has the body temperature of generally 38 ℃, is occasionally intolerant of cold, can be accompanied by headache, joint ache, muscle ache, hypodynamia and diarrhea, is not frequently accompanied by upper respiratory catarrh symptoms, can be accompanied by cough, is mostly dry cough and less phlegm, is occasionally accompanied by blood streak phlegm, can be suffered from chest distress, and is accelerated in respiration, breathlessness or obvious respiratory distress for severe patients. The lung signs are not obvious, and some patients may smell a little damp Luo Yin or have lung excess signs. Note that: a few patients do not have fever as the first symptom, especially patients with recent surgical history or basic diseases; 3. laboratory examination results: peripheral blood leukocyte counts generally do not rise, or decrease, with a constant decrease in lymphocyte counts; 4. chest X-ray examination results: the lungs have varying degrees of flaky, patchy, infiltrative shadows or reticular changes, portions of the patient progress rapidly, large flakiness, often multi-lobal or bilateral changes, the shadows dissipate more slowly, and the lung shadows may be inconsistent with signs of symptoms. If the test result is negative, the test should be repeated after 1-2 days; 5. the antibacterial drug has no obvious effect on treatment.
From the above-mentioned legal diagnostic criteria, the classification type of the legal infectious atypical pneumonia (trial) diagnostic criteria can be determined: 1) and suspected diagnostic standard: 1.1+2+3 or 1.2+2+4 or 2+3+ 4; 2) and clinical diagnosis standard: the number of the above 1.1+2+4 and above or 1.2+2+4+5 or 1.2+2+3+ 4; 3) medical observation and diagnosis standard: the above 1.2+2+3 strips were followed. 4) And differential diagnosis: clinically, respiratory system diseases with similar clinical manifestations, such as upper respiratory infection, influenza, bacterial or fungal pneumonia, AIDS complicated with lung infection, legionnaires' disease, tuberculosis, epidemic hemorrhagic fever, lung tumor, noninfectious interstitial disease, pulmonary edema, pulmonary atelectasis, pulmonary embolism, lung eosinophilic infiltration disease, pulmonary vasculitis and the like, need to be excluded; 5) diagnosis standard of severe atypical pneumonia: severe "atypical pneumonia" can be diagnosed by meeting 1 of the following criteria: A. dyspnea, respiratory rate >30 beats/minute; B. hypoxemia, arterial partial pressure of blood oxygen PaO2<70mmHg or pulse volume blood oxygen saturation SpO2< 93% under oxygen inhalation condition of 3-5L/min, or has been diagnosed as acute lung injury ALI or acute respiratory distress syndrome ARDS; C. multilobal lesions with lesion range exceeding 1/3 or chest X-ray showing > 50% lesion progression within 48 hours; D. shock or multiple organ dysfunction syndrome MODS; E. with severe underlying disease or with other infections or an age >50 years.
After the legal infectious disease case classification and the specific indexes of the diagnostic standard are determined, the main characteristic information contained in each infectious disease case classification is extracted through the information extraction module and is used as the most core characteristic information for identifying, authenticating or distinguishing different case classification standards of the same infectious disease.
The core characteristic information of each infectious disease case classification plays an important role in an infectious disease recognition system as the most core detail characteristic for authenticating or distinguishing infectious diseases. For example, human avian influenza can be diagnosed after other diseases are excluded based on epidemiological history, clinical manifestations and laboratory test results. Then, the main characteristic information contained in the classification of cases of human infection with highly pathogenic avian influenza is: 1. medical observation cases: epidemiological history, clinical manifestations within 1 week; the medicine has close contact history with human avian influenza patients, and clinical manifestations appear within 1 week; 2. suspected cases: the patient respiratory secretion specimen adopts influenza A virus and H subtype monoclonal antibody antigen to detect positive patients; 3. the confirmed cases: has epidemiological history and clinical manifestations, and can be used for separating specific virus from airway secretion specimen of patient or detecting avian influenza H subtype virus gene by RT-PCR method, and the antibody titer of double serum against avian influenza virus in early stage of onset and convalescent period is 4 times or more higher.
For another example, the most core characteristic information of the legal cholera diagnostic standard includes: 1. suspected cholera diagnosis standard characteristic information: a. the first cases with typical clinical symptoms, such as severe diarrhea, watery stool (yellow water sample, clear water sample, rice swill sample or blood water sample), accompanied by vomiting, rapid occurrence of severe dehydration, circulatory failure and muscle spasm (especially gastrocnemius) are not yet confirmed in the etiological examination; b. during the epidemic, there is a definite history of contact (like meals, cohabitation or caregivers, etc.) and symptoms of vomiting are developed without any other reason to examine. One of the above items is diagnosed as suspected cholera; 2. determining diagnostic criteria characteristic information: a. the Vibrio cholerae of group 01 or group 0139 is cultured in feces with diarrhea symptoms to be positive; b. the cholera typical symptoms (see 1a) exist in epidemic areas during the epidemic period of cholera, and the cholera vibrio is negative in group 01 and group 0139 in fecal culture but has no other reasons to be examined; c. diarrhea symptoms in epidemic areas during epidemic period, and double serum antibody titer measurement is performed, such as the antibody of the vibrio killing bacteria is increased by more than 4 times in the serum agglutination test or more than 8 times in the vibrio killing antibody measurement; d. in epidemic source examination, the first fecal culture detects people with diarrhea symptoms in 5 days before and after the culture of 01 group or 0139 group of vibrio cholerae; and (3) clinical diagnosis: is provided with b; the confirmed cases: having a or c or d;
and establishing a standard database according to the correlation between the diagnosis standards of various infectious diseases and different case classification types of the same infectious disease and the corresponding main characteristic information, wherein the standard database has the characteristics of comprehensiveness and standardization and is used as a standard characteristic information base for auxiliary diagnosis and identification of the infectious diseases.
The invention respectively carries out text mining on the characteristic information of the standard database through a first text mining module and a second text mining module to construct a vector space model.
The existing characteristic information extraction algorithm usually needs a series of preprocessing steps with a priori knowledge as a support, and the preprocessing steps are often used forA great deal of information loss is caused, so that the extraction omission and the extraction error of the detail nodes (characteristic information) are caused, and the identification accuracy of the whole system is further influenced. In order to overcome the defects of the traditional algorithm, the vector space model is applied to the feature extraction of the infectious disease diagnosis standard, so that two important problems of type classification and feature information extraction can be effectively solved, the loss of feature information is greatly reduced, and the accuracy of intelligent identification and diagnosis is improved. The concrete mode is as follows: by characteristic entries (T)1,T2,…Tn) And its weight value omegaiRepresenting main characteristic information corresponding to a case classification type diagnosis standard in the database to form a space vector, and evaluating the correlation degree of the unknown text and the space vector in the database by using the characteristic items when information matching is carried out.
The first text mining module adopts TF-IDF to carry out weight calculation and first core feature word extraction, and a vector space model is constructed to obtain a first feature vector set corresponding to main feature information of each infectious disease case classification; in the first feature vector set, each feature vector represents a diagnosis standard of an infectious disease case classification and corresponding main feature information.
Let D be a set of documents comprising m documents, DiFor the feature vector of the ith document, there are: d ═ D1,D2,…,Dm},Di=(di1,di2,…,din) I is 1,2, …, m, wherein dijAs a document DiThe j-th entry tjThe weight value of (2):
Figure GDA0003594068060000081
where i is 1,2, …, m; j is 1,2, …, N is the total number of documents in the document database, NjIs that the document database contains an entry tjThe number of documents.
And after the entry weight is obtained through calculation, screening out a first core characteristic word according to the weight, and forming a vector space model by the first core characteristic word and the corresponding weight. Through the vector space model, text data is converted into structured data which can be processed by a computer, and the similarity problem between two documents is converted into the similarity problem between two vectors.
Suppose that a feature vector corresponding to a certain class in a first feature vector set of a standard database is VkThe feature vector of the text to be classified is V0The similarity between the two vectors can be the cosine of the included angle between the two vectors
Figure GDA0003594068060000082
By measure, a smaller angle indicates a higher similarity.
The second text mining module constructs a feature selection model based on conditional mutual information and performs text mining on the main feature information of the standard database; performing weight calculation according to the relevancy between the entry of the main characteristic information and the case classification, selecting a second core characteristic word, and constructing a vector space model to obtain a second characteristic vector set corresponding to the main characteristic information of each infectious disease case classification;
taking a certain infectious case classification as an example: selecting main characteristic information corpora of case classification such as suspected diagnosis standard, clinical diagnosis standard, confirmed diagnosis standard, medical observation diagnosis standard, severe diagnosis standard, diagnosis and discrimination standard and the like, and selecting words to establish a space vector model through mutual information.
Firstly, calculating the mutual information correlation degree between each entry of the main characteristic information contained in case classification and the case classification, wherein the formula is as follows:
Figure GDA0003594068060000083
wherein, A is the number of documents with the term t appearing in the case classification category c; b is the number of documents in which the term t appears in categories other than the case classification category c; c is the number of documents with no word bar t in the case classification category C; n is the sum of the number of documents in all categories; if the number of categories is m, each entry obtains m correlation values;
and taking the average value of the m values as the weight of each entry, sequencing the entries from low to high according to the word frequency, removing words which only appear in a single category and have the word frequency lower than a preset word frequency threshold, sequencing the rest entries from high to low according to the weight, and taking the words with the weight value higher than the preset weight threshold as second core characteristic words. And constructing a feature vector according to the second core feature word and the corresponding weight.
The feature matching module performs feature matching according to results of the first text mining module and the second text mining module, and respectively calculates cosine similarity between the text to be classified and elements in the first feature vector set; respectively calculating the mutual information correlation between the text to be classified and the second feature vector set element; mutual information is used for measuring the correlation between certain characteristic information and a specific class, and if the mutual information is larger, the correlation between the characteristic information and the class is larger, and the probability of belonging to the class is larger. The reverse is also true. And then, carrying out case classification on the text to be classified according to the cosine similarity and the mutual information correlation.
The concrete way of case classification of the text to be classified according to the cosine similarity and the mutual information correlation degree has multiple choices:
1. and for each case classification category, taking the weighted sum of the cosine similarity and the correlation as a first output probability value of the corresponding case classification category, arranging the output probability values in a descending order, setting a first probability threshold, and taking the category with the probability value larger than the first probability threshold as an identification recommendation result.
2. And for each case classification category, taking the maximum value of the cosine similarity and the mutual information correlation as a second output probability value of the corresponding case classification category, arranging the output probability values in a descending order, setting a second probability threshold, and taking the category with the probability value larger than the second probability threshold as an identification recommendation result.
The identification recommendation results are one or more, the identification recommendation results are arranged in a descending order, the types with higher similarity or the combination of the two modes are identified in multiple modes and are selected in a cross-contrast mode to serve as the recommendation diagnosis results, multi-directional auxiliary diagnosis reference is provided for medical staff, information which is ambiguous and plausible characteristic information and sometimes cannot be judged or can be correctly determined is provided for some medical staff, accurate auxiliary diagnosis is made through efficient characteristic information matching, the medical staff is helped to make correct judgment, and missing reports and wrong reports are reduced.
The present invention also discloses an electronic device, comprising: at least one processor, at least one memory, a communication interface, and a bus;
the processor, the memory and the communication interface complete mutual communication through the bus;
the storage stores program instructions which can be executed by the processor, and the processor calls the program instructions to realize the system for intelligently identifying the infectious diseases based on the legal diagnosis standard, which comprises an index construction module, an information extraction module, a standard database, a first text mining module, a second text mining module and a feature matching module.
The invention also discloses a computer readable storage medium, which stores computer instructions, and the computer instructions enable the computer to realize all the system or part of the system according to the embodiment of the invention. For example, the system comprises an index construction module, an information extraction module, a standard database, a first text mining module, a second text mining module and a feature matching module. The storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.
The above-described system embodiments are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like that fall within the spirit and principle of the present invention are intended to be included therein.

Claims (7)

1. A system for intelligent identification of infectious diseases based on legal diagnostic criteria, said system comprising:
an index construction module: the system is used for drawing and constructing the legal infectious disease case classification and the specific indexes of the diagnostic standard according to the legal infectious disease diagnostic standard;
the information extraction module: the method is used for extracting main characteristic information contained in each infectious disease case classification according to the legal infectious disease case classification and specific indexes of diagnostic standards;
a standard database: the standard database is used for establishing the incidence relation between the diagnosis standards of various infectious diseases and different case classification types of the same infectious disease and the corresponding main characteristic information;
a first text mining module: the system comprises a standard database, a TF-IDF function, a vector space model and a first feature vector set, wherein the standard database is used for performing text mining on main feature information of the standard database, performing weight calculation and first core feature word extraction by adopting the TF-IDF function, and obtaining a first feature vector set corresponding to the main feature information of each infectious disease case classification;
the second text mining module: the system comprises a standard database, a first feature vector set, a second feature vector set, a first core feature word and a second core feature word, wherein the standard database is used for constructing a feature selection model based on conditional mutual information, performing text mining on main feature information of the standard database, performing weight calculation according to mutual information correlation between entries of the main feature information and case classifications, selecting the second core feature word, and constructing a vector space model to obtain the second feature vector set corresponding to the main feature information of each infectious disease case classification;
a feature matching module: the cosine similarity between the text to be classified and the elements in the first feature vector set is calculated respectively; respectively calculating the mutual information correlation between the text to be classified and the second feature vector set element; and performing case classification on the text to be classified according to the cosine similarity and the mutual information correlation.
2. The system for intelligent infectious disease identification based on statutory diagnostic criteria as claimed in claim 1, wherein the first text mining module calculates the entry weights of the main characteristic information using TF-IDF function:
let D be a set of documents comprising m documents, DiFor the feature vector of the ith document, there are: d ═ D1,D2,…,Dm},Di=(di1,di2,…,din) I is 1,2, …, m, wherein dijAs a document DiThe j-th entry tjThe weight of (2):
Figure FDA0003594068050000011
where i is 1,2, …, m; j is 1,2, …, N is the total number of documents in the standard database, NjIs that the standard database contains an entry tjThe number of documents in the document list.
3. The system for intelligent infectious disease identification based on statutory diagnostic criteria as claimed in claim 1, wherein in the feature matching module, the weight calculation is performed according to the correlation between the entry of the main feature information and the case classification, and the second core feature word is selected as follows:
calculating the mutual information correlation degree between each entry of the main characteristic information contained in the case classification and the case classification, wherein the formula is as follows:
Figure FDA0003594068050000021
wherein, A is the number of documents with the term t appearing in the case classification category c; b is the number of documents in which the term t appears in categories other than the case classification category c; c is the number of documents with no word bar t in the case classification category C; n is the sum of the number of documents in all categories; if the number of categories is m, each entry obtains m correlation values;
and taking the average value of the m values as the weight of each entry, sequencing the entries from low to high according to the word frequency, removing words which only appear in a single category and have the word frequency lower than a preset word frequency threshold, sequencing the rest entries from high to low according to the weight, and taking the words with the weight value higher than the preset weight threshold as second core characteristic words.
4. The system for intelligent infectious disease identification based on statutory diagnostic criteria as claimed in claim 1, wherein in said feature matching module, said case classification of the text to be classified according to said cosine similarity and mutual information correlation is specifically:
and for each case classification category, taking the maximum value of the cosine similarity and the mutual information correlation as the output probability value of the corresponding case classification category, setting a first probability threshold, and taking the category with the probability value larger than the first probability threshold as the recognition recommendation result.
5. The system for intelligent infectious disease identification based on statutory diagnostic criteria as claimed in claim 1, wherein in said feature matching module, said case classification of the text to be classified according to said cosine similarity and mutual information correlation is specifically:
and for each case classification category, taking the weighted sum of the cosine similarity and the correlation as the output probability value of the corresponding case classification category, setting a second probability threshold, and taking the category with the probability value larger than the second probability threshold as the recognition recommendation result.
6. An electronic device, comprising:
at least one processor, at least one memory, a communication interface, and a bus;
the processor, the memory and the communication interface complete mutual communication through the bus;
the memory stores program instructions executable by the processor, which are invoked by the processor to implement the system of any one of claims 1 to 5.
7. A computer-readable storage medium storing computer instructions that cause a computer to implement the system of any one of claims 1 to 5.
CN202010659267.7A 2020-07-09 2020-07-09 System, apparatus and storage medium for intelligent infectious disease identification based on legal diagnostic criteria Active CN111816321B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010659267.7A CN111816321B (en) 2020-07-09 2020-07-09 System, apparatus and storage medium for intelligent infectious disease identification based on legal diagnostic criteria

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010659267.7A CN111816321B (en) 2020-07-09 2020-07-09 System, apparatus and storage medium for intelligent infectious disease identification based on legal diagnostic criteria

Publications (2)

Publication Number Publication Date
CN111816321A CN111816321A (en) 2020-10-23
CN111816321B true CN111816321B (en) 2022-06-14

Family

ID=72842203

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010659267.7A Active CN111816321B (en) 2020-07-09 2020-07-09 System, apparatus and storage medium for intelligent infectious disease identification based on legal diagnostic criteria

Country Status (1)

Country Link
CN (1) CN111816321B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112331352A (en) * 2020-11-04 2021-02-05 吾征智能技术(北京)有限公司 Intelligent information matching system based on dengue fever
CN112269880B (en) * 2020-11-04 2024-02-09 吾征智能技术(北京)有限公司 Sweet text classification matching system based on linear function
CN112185586A (en) * 2020-11-17 2021-01-05 北京嘉和海森健康科技有限公司 Infectious disease monitoring and early warning method and device
CN114996463B (en) * 2022-07-18 2022-11-01 武汉大学人民医院(湖北省人民医院) Intelligent classification method and device for cases

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140073882A1 (en) * 2012-09-12 2014-03-13 Consuli, Inc. Clinical diagnosis objects authoring
US20140195168A1 (en) * 2013-01-06 2014-07-10 Yahya Shaikh Constructing a differential diagnosis and disease ranking in a list of differential diagnosis
CN106372439A (en) * 2016-09-21 2017-02-01 北京大学 Method for acquiring and processing disease symptoms and weight knowledge thereof based on case library
CN110111902B (en) * 2019-04-04 2022-05-27 平安科技(深圳)有限公司 Acute infectious disease attack period prediction method, device and storage medium

Also Published As

Publication number Publication date
CN111816321A (en) 2020-10-23

Similar Documents

Publication Publication Date Title
CN111816321B (en) System, apparatus and storage medium for intelligent infectious disease identification based on legal diagnostic criteria
Sethy et al. Detection of coronavirus disease (covid-19) based on deep features
Wyllie et al. Saliva is more sensitive for SARS-CoV-2 detection in COVID-19 patients than nasopharyngeal swabs
CN104915561B (en) Genius morbi intelligent Matching method
WO2018205609A1 (en) Medical intelligent triage method and device
CN109671507A (en) A kind of obstetrics&#39; disease that calls for specialized treatment coupling index method for digging based on Electronic Health Record
CN106951710B (en) CAP data system and method based on privilege information learning support vector machine
CN112530578A (en) Viral pneumonia intelligent diagnosis system based on multi-mode information fusion
CN111816245B (en) Pathogenic microorganism detection method and system combining mNGS and clinical knowledge base
Santoso et al. A Modified Deep Convolutional Network for COVID-19 detection based on chest X-ray images
CN113113152A (en) Disease data set sample acquisition processing method, system, device, processor and storage medium thereof for novel coronavirus pneumonia
Sivankalai et al. Bibliometric study on COVID 19 Outbreak
CN116864062B (en) Health physical examination report data analysis management system based on Internet
CN112002413A (en) Cardiovascular system infection intelligent cognitive system, equipment and storage medium
Chiwariro et al. Comparative analysis of deep learning convolutional neural networks based on transfer learning for pneumonia detection
CN111951964A (en) Method and system for rapidly detecting novel coronavirus pneumonia
Tripathi et al. Coronavirus: Diagnosis, detection, and analysis
Freyburger et al. Rapid ELISA D-dimer testing in the exclusion of venous thromboembolism in hospitalized patients
Jebur et al. Covid-19 detection using medical images
CN113066547B (en) ARDS early dynamic early warning method and system based on conventional noninvasive parameters
KORKMAZ et al. Derin öğrenme teknikleriyle akciğer röntgeninden Covid-19 tespiti
Abbasa et al. The application of Hybrid deep learning Approach to evaluate chest ray images for the diagnosis of pneumonia in children
CN109657245A (en) A kind of semantics identity method of electronic health record
Ajesh et al. Cad Systems for Automatic Detection and Classification of Covid19 Using Image Processing and Machine Learning
Jia et al. Value of medical imaging artificial intelligence in the diagnosis and treatment of new coronavirus pneumonia

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CP03 Change of name, title or address
CP03 Change of name, title or address

Address after: Room 1101, 11th Floor, Building B4, Future Science and Technology City, No. 999 Gaoxin Avenue, Donghu New Technology Development Zone, Wuhan City, Hubei Province, 430000

Patentee after: Wuhan Donghu Big Data Technology Co.,Ltd.

Country or region after: China

Address before: 430000 Room 2101, F3 Building, Phase I, Longshan Innovation Park, 999 High-tech Avenue, Donghu New Technology Development Zone, Wuhan City, Hubei Province

Patentee before: WUHAN DONGHU BIG DATA TRADING CENTER Co.,Ltd.

Country or region before: China