CN106844325B - Medical information processing method and medical information processing apparatus - Google Patents
Medical information processing method and medical information processing apparatus Download PDFInfo
- Publication number
- CN106844325B CN106844325B CN201510886242.XA CN201510886242A CN106844325B CN 106844325 B CN106844325 B CN 106844325B CN 201510886242 A CN201510886242 A CN 201510886242A CN 106844325 B CN106844325 B CN 106844325B
- Authority
- CN
- China
- Prior art keywords
- medical
- words
- association
- texts
- medical texts
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 230000010365 information processing Effects 0.000 title claims abstract description 29
- 238000003672 processing method Methods 0.000 title claims abstract description 13
- 230000011218 segmentation Effects 0.000 claims abstract description 13
- 201000010099 disease Diseases 0.000 claims description 23
- 208000037265 diseases, disorders, signs and symptoms Diseases 0.000 claims description 23
- 206010037660 Pyrexia Diseases 0.000 description 24
- 206010011224 Cough Diseases 0.000 description 21
- 239000000243 solution Substances 0.000 description 16
- 238000000034 method Methods 0.000 description 9
- 238000004458 analytical method Methods 0.000 description 6
- 238000010586 diagram Methods 0.000 description 6
- 239000003814 drug Substances 0.000 description 5
- 206010002091 Anaesthesia Diseases 0.000 description 4
- WQZGKKKJIJFFOK-GASJEMHNSA-N Glucose Chemical compound OC[C@H]1OC(O)[C@H](O)[C@@H](O)[C@@H]1O WQZGKKKJIJFFOK-GASJEMHNSA-N 0.000 description 4
- 206010062717 Increased upper airway secretion Diseases 0.000 description 4
- 208000002193 Pain Diseases 0.000 description 4
- 208000003251 Pruritus Diseases 0.000 description 4
- 230000037005 anaesthesia Effects 0.000 description 4
- 229940079593 drug Drugs 0.000 description 4
- 238000000605 extraction Methods 0.000 description 4
- 229940093181 glucose injection Drugs 0.000 description 4
- 208000019622 heart disease Diseases 0.000 description 4
- 230000036407 pain Effects 0.000 description 4
- 210000003800 pharynx Anatomy 0.000 description 4
- 208000026435 phlegm Diseases 0.000 description 4
- 208000024891 symptom Diseases 0.000 description 4
- 238000010276 construction Methods 0.000 description 3
- 230000018109 developmental process Effects 0.000 description 3
- 238000005516 engineering process Methods 0.000 description 3
- 206010000087 Abdominal pain upper Diseases 0.000 description 2
- 208000008035 Back Pain Diseases 0.000 description 2
- 208000008930 Low Back Pain Diseases 0.000 description 2
- 238000004364 calculation method Methods 0.000 description 2
- 230000003203 everyday effect Effects 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 206010068319 Oropharyngeal pain Diseases 0.000 description 1
- 201000007100 Pharyngitis Diseases 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 208000017574 dry cough Diseases 0.000 description 1
- 230000010354 integration Effects 0.000 description 1
- 230000007803 itching Effects 0.000 description 1
- 210000002784 stomach Anatomy 0.000 description 1
- 206010044008 tonsillitis Diseases 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/205—Parsing
-
- G06F19/34—
Abstract
The invention provides a medical information processing method and a medical information processing device, wherein the medical information processing method comprises the following steps: performing word segmentation on a plurality of medical texts, and clustering the plurality of medical texts; determining the association degree of every two medical texts according to the words of every two medical texts in the medical texts of the same category; judging whether words of any two medical texts in the medical texts of the same category have an association relation or not according to the association degree of every two medical texts; and when the judgment result is yes, performing association storage on the words with the association relation. Through the technical scheme of the invention, the words with the association relation in the medical text can be more accurately and comprehensively excavated, so that the medical word bank constructed according to the words with the association relation is more accurate and comprehensive.
Description
Technical Field
The present invention relates to the field of information processing technologies, and in particular, to a medical information processing method and a medical information processing apparatus.
Background
At present, the informatization of medical services is an international development trend, along with the rapid development of Information technology, more and more hospitals in China are accelerating to implement the overall construction based on an informatization platform and a Hospital Information System (HIS) so as to improve the service level and the core competitiveness of the hospitals, the medical informatization not only improves the working efficiency of doctors and enables the doctors to have more time to serve patients, but also improves the satisfaction and the trust of the patients, and the scientific and technological image of the hospitals is established invisibly. Therefore, the gradual integration of medical service application and basic network platform is becoming a new direction for the informatization development of domestic hospitals, especially large and medium-sized hospitals.
In the medical informatization process, the construction of the medical word stock is very important and fundamental work, and the construction of the medical word stock is beneficial to realizing the electronization of medical records, the analysis of a large number of unstructured medical texts on the Internet and the intelligent analysis of medical records of patients. Although there is a well-established medical word stock system abroad, it is not suitable for the domestic medical word stock with Chinese as the mother language. English-Chinese parallel corpus, Chinese medicine and pharmacy lexicon and the like are also constructed domestically, however, words in the domestic medical lexicon are not comprehensive and lack certain correctness.
Therefore, how to construct a more accurate and comprehensive medical word stock becomes a problem to be solved urgently.
Disclosure of Invention
Based on the problems, the invention provides a new technical scheme, which can more accurately and comprehensively dig out words with association relation in medical texts, so that a medical word bank constructed according to the words with association relation is more accurate and comprehensive.
In view of the above, an aspect of the present invention provides a medical information processing method, including: performing word segmentation on a plurality of medical texts, and clustering the plurality of medical texts; determining the association degree of every two medical texts according to the words of every two medical texts in the medical texts of the same category; judging whether words of any two medical texts in the medical texts of the same category have an association relation or not according to the association degree of every two medical texts; and when the judgment result is yes, performing association storage on the words with the association relation.
In the technical scheme, the association degree of every two medical texts is determined according to the words in every two medical texts in the medical texts of the same category, whether an association relationship exists between any two words in the medical texts of the same category is judged according to the association degree of every two medical texts, and the words with the association relationship are stored in an association manner, for example, in a medical word bank, so as to construct a more complete medical word bank. For example, the words in the a-medical text are: cold and fever, the words in the B medical text are: fever and cough, the words in the C medical text are: cough and cold, it can be seen that a and B have similar words: fever and fever, 30% correlation between a and B, with the same words in B and C: in the cough, the association degree between B and C is 50%, and A and C do not have the same or similar words, but because A and B have an association, the association between A and C can be determined, that is, the association between the words of A and C exists. Therefore, the method and the device can further dig out the words with the implicit association relationship, so that the words with the association relationship in the medical text can be more accurately and comprehensively duout. Furthermore, a search engine of medical treatment information can be constructed according to the words with the incidence relation, or automatic analysis of medical treatment text information is realized, and convenience is provided for outpatients doctors and patients to inquire diseases and symptoms.
Preferably, the plurality of medical texts may be electronic medical records in a medical system of a hospital, or may be obtained from a medical professional website by using a crawler program. Because the scale of the medical texts is larger, the distributed file system can store the medical texts.
In the above technical solution, preferably, the step of performing association storage on the words with association relationship further includes: determining the association degree of words in any two medical texts according to the association degree of any two medical texts; and storing the association degree of the words in any two medical texts.
In the technical scheme, the association degree of the words in any two medical texts is determined according to the association degree of any two medical texts, specifically, the association degree of any two medical texts can be used as the association degree of the words in any two medical texts, and the association degree of the words in any two medical texts can be calculated according to a preset algorithm, so that the association degree of the words can be reflected more accurately and intuitively according to the association degree of the words. For example, the words in the a-medical text are: cold and fever, the words in the C medical text are: cough and coolness, the degree of association between a and C is 10%, and the degree of association between cold and cough is 10%.
In any one of the above technical solutions, preferably, the step of segmenting the plurality of medical texts specifically includes: and performing word segmentation on the medical texts according to the dictionary and the parts of speech of the words in the medical texts.
In the technical scheme, the words of the medical texts can be cut according to words and parts of speech in a dictionary (preferably a medical dictionary), specifically, the words of the medical texts are cut according to the words in the dictionary, if the words in the medical texts do not exist in the dictionary, whether the words are associated with front and rear words or not is judged according to the parts of speech of the words, and whether new words need to be combined or not is judged, so that the situations of word miscut and word omission are effectively avoided, and the accuracy and the comprehensiveness of word cutting are further ensured.
In any one of the above technical solutions, preferably, the step of clustering the plurality of medical texts specifically includes: clustering the plurality of medical texts according to international disease classification and K-means algorithm.
In the technical scheme, the plurality of medical texts can be clustered according to International Classification of Disease (ICD) and a K-means algorithm, and since the medical texts of the same category obtained by clustering have the same Disease, the possibility that the words of the medical texts of the same category obtained by clustering are associated is high, and then the medical texts of the same category are further processed to ensure the processing speed.
In any one of the above technical solutions, preferably, the step of performing association storage on the words with association relations specifically includes: and storing the words with the association relation according to the attributes of the words with the association relation.
In the technical scheme, the word is stored according to the attribute of the word with the association relationship, for example, the attribute of the word is as follows: the medical information storage system comprises body parts (such as heads and limbs), predicates (such as pains and strains), diseases (such as fever and heart diseases), medicines (such as Gregorian tablets and glucose injection), treatment means (such as drip and anesthesia), and neglected words (such as home and patient) which do not contribute to information extraction), so that the storage of related words is more orderly.
Another aspect of the present invention provides a medical information processing apparatus including: the processing unit is used for segmenting a plurality of medical texts and clustering the medical texts; the first determination unit is used for determining the association degree of every two medical texts according to the words of every two medical texts in the medical texts of the same category; the judging unit is used for judging whether words of any two medical texts in the medical texts of the same category have an association relation or not according to the association degree of every two medical texts; and the storage unit is used for associating and storing the words with the association relation when the judgment result is yes.
In the technical scheme, the association degree of every two medical texts is determined according to the words in every two medical texts in the medical texts of the same category, whether an association relationship exists between any two words in the medical texts of the same category is judged according to the association degree of every two medical texts, and the words with the association relationship are stored in an association manner, for example, in a medical word bank, so as to construct a more complete medical word bank. For example, the words in the a-medical text are: cold and fever, the words in the B medical text are: fever and cough, the words in the C medical text are: cough and cold, it can be seen that a and B have similar words: fever and fever, 30% correlation between a and B, with the same words in B and C: in the cough, the association degree between B and C is 50%, and A and C do not have the same or similar words, but because A and B have an association, the association between A and C can be determined, that is, the association between the words of A and C exists. Therefore, the method and the device can further dig out the words with the implicit association relationship, so that the words with the association relationship in the medical text can be more accurately and comprehensively duout. Furthermore, a search engine of medical treatment information can be constructed according to the words with the incidence relation, or automatic analysis of medical treatment text information is realized, and convenience is provided for outpatients doctors and patients to inquire diseases and symptoms.
Preferably, the plurality of medical texts may be electronic medical records in a medical system of a hospital, or may be obtained from a medical professional website by using a crawler program. Because the scale of the medical texts is larger, the distributed file system can store the medical texts.
In the above technical solution, preferably, the storage unit includes: the second determining unit is used for determining the association degree of the words in any two medical texts according to the association degree of any two medical texts; the storage unit is specifically configured to store the association degrees of the words in any two medical texts.
In the technical scheme, the association degree of the words in any two medical texts is determined according to the association degree of any two medical texts, specifically, the association degree of any two medical texts can be used as the association degree of the words in any two medical texts, and the association degree of the words in any two medical texts can be calculated according to a preset algorithm, so that the association degree of the words can be reflected more accurately and intuitively according to the association degree of the words. For example, the words in the a-medical text are: cold and fever, the words in the C medical text are: cough and coolness, the degree of association between a and C is 10%, and the degree of association between cold and cough is 10%.
In any one of the above technical solutions, preferably, the processing unit includes: and the word cutting unit is used for cutting words of the medical texts according to the dictionary and the parts of speech of the words in the medical texts.
In the technical scheme, the words of the medical texts can be cut according to words and parts of speech in a dictionary (preferably a medical dictionary), specifically, the words of the medical texts are cut according to the words in the dictionary, if the words in the medical texts do not exist in the dictionary, whether the words are associated with front and rear words or not is judged according to the parts of speech of the words, and whether new words need to be combined or not is judged, so that the situations of word miscut and word omission are effectively avoided, and the accuracy and the comprehensiveness of word cutting are further ensured.
In any one of the above technical solutions, preferably, the processing unit includes: and the clustering unit is used for clustering the plurality of medical texts according to the international disease classification and the K-means algorithm.
In the technical scheme, the plurality of medical texts can be clustered according to International Classification of Disease (ICD) and a K-means algorithm, and since the medical texts of the same category obtained by clustering have the same Disease, the possibility that the words of the medical texts of the same category obtained by clustering are associated is high, and then the medical texts of the same category are further processed to ensure the processing speed.
In any of the foregoing technical solutions, preferably, the storage unit is specifically configured to store the words having an association relationship according to the attribute of the words having an association relationship.
In the technical scheme, the word is stored according to the attribute of the word with the association relationship, for example, the attribute of the word is as follows: the medical information storage system comprises body parts (such as heads and limbs), predicates (such as pains and strains), diseases (such as fever and heart diseases), medicines (such as Gregorian tablets and glucose injection), treatment means (such as drip and anesthesia), and neglected words (such as home and patient) which do not contribute to information extraction), so that the storage of related words is more orderly.
Through the technical scheme of the invention, the words with the association relation in the medical text can be more accurately and comprehensively excavated, so that the medical word bank constructed according to the words with the association relation is more accurate and comprehensive.
Drawings
Fig. 1 shows a flow diagram of a medical information processing method according to an embodiment of the invention;
fig. 2 shows a schematic configuration diagram of a medical information processing apparatus according to an embodiment of the present invention;
fig. 3 shows a schematic diagram of a medical information processing apparatus according to an embodiment of the invention.
Detailed Description
So that the manner in which the above recited objects, features and advantages of the present invention can be understood in detail, a more particular description of the invention, briefly summarized above, may be had by reference to the embodiments thereof which are illustrated in the appended drawings. It should be noted that the embodiments and features of the embodiments of the present application may be combined with each other without conflict.
In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present invention, however, the present invention may be practiced in other ways than those specifically described herein, and therefore the scope of the present invention is not limited by the specific embodiments disclosed below.
Fig. 1 shows a flow diagram of a medical information processing method according to an embodiment of the present invention.
As shown in fig. 1, a medical information processing method according to an embodiment of the present invention includes:
102, performing word segmentation on a plurality of medical texts, and clustering the plurality of medical texts;
104, determining the association degree of every two medical texts according to the words of every two medical texts in the medical texts of the same category;
and step 108, performing association storage on the words with the association relation.
In the technical scheme, the association degree of every two medical texts is determined according to the words in every two medical texts in the medical texts of the same category, whether an association relationship exists between any two words in the medical texts of the same category is judged according to the association degree of every two medical texts, and the words with the association relationship are stored in an association manner, for example, in a medical word bank, so as to construct a more complete medical word bank. For example, the words in the a-medical text are: cold and fever, the words in the B medical text are: fever and cough, the words in the C medical text are: cough and cold, it can be seen that a and B have similar words: fever and fever, 30% correlation between a and B, with the same words in B and C: in the cough, the association degree between B and C is 50%, and A and C do not have the same or similar words, but because A and B have an association, the association between A and C can be determined, that is, the association between the words of A and C exists. Therefore, the method and the device can further dig out the words with the implicit association relationship, so that the words with the association relationship in the medical text can be more accurately and comprehensively duout. Furthermore, a search engine of medical treatment information can be constructed according to the words with the incidence relation, or automatic analysis of medical treatment text information is realized, and convenience is provided for outpatients doctors and patients to inquire diseases and symptoms.
Preferably, the plurality of medical texts may be electronic medical records in a medical system of a hospital, or may be obtained from a medical professional website by using a crawler program. Because the scale of the medical texts is larger, the distributed file system can store the medical texts.
In the above technical solution, preferably, step 108 further includes: determining the association degree of words in any two medical texts according to the association degree of any two medical texts; and storing the association degree of the words in any two medical texts.
In the technical scheme, the association degree of the words in any two medical texts is determined according to the association degree of any two medical texts, specifically, the association degree of any two medical texts can be used as the association degree of the words in any two medical texts, and the association degree of the words in any two medical texts can be calculated according to a preset algorithm, so that the association degree of the words can be reflected more accurately and intuitively according to the association degree of the words. For example, the words in the a-medical text are: cold and fever, the words in the C medical text are: cough and coolness, the degree of association between a and C is 10%, and the degree of association between cold and cough is 10%.
In any one of the above technical solutions, preferably, the step of segmenting the plurality of medical texts specifically includes: and performing word segmentation on the medical texts according to the dictionary and the parts of speech of the words in the medical texts.
In the technical scheme, the words of the medical texts can be cut according to words and parts of speech in a dictionary (preferably a medical dictionary), specifically, the words of the medical texts are cut according to the words in the dictionary, if the words in the medical texts do not exist in the dictionary, whether the words are associated with front and rear words or not is judged according to the parts of speech of the words, and whether new words need to be combined or not is judged, so that the situations of word miscut and word omission are effectively avoided, and the accuracy and the comprehensiveness of word cutting are further ensured. Preferably, the words obtained by segmenting the medical text are medical words, so as to avoid interference of irrelevant words (such as every day, patients, home) in determining the relevance of the medical text.
In any one of the above technical solutions, preferably, the step of clustering the plurality of medical texts specifically includes: clustering the plurality of medical texts according to international disease classification and K-means algorithm.
In the technical scheme, the plurality of medical texts can be clustered according to International Classification of Disease (ICD) and a K-means algorithm, and since the medical texts of the same category obtained by clustering have the same Disease, the possibility that the words of the medical texts of the same category obtained by clustering are associated is high, and then the medical texts of the same category are further processed to ensure the processing speed.
In any of the above technical solutions, preferably, step 108 specifically includes: and storing the words with the association relation according to the attributes of the words with the association relation.
In the technical scheme, the word is stored according to the attribute of the word with the association relationship, for example, the attribute of the word is as follows: the medical information storage system comprises body parts (such as heads and limbs), predicates (such as pains and strains), diseases (such as fever and heart diseases), medicines (such as Gregorian tablets and glucose injection), treatment means (such as drip and anesthesia), and neglected words (such as home and patient) which do not contribute to information extraction), so that the storage of related words is more orderly.
Fig. 2 shows a schematic configuration diagram of a medical information processing apparatus according to an embodiment of the present invention.
As shown in fig. 2, a medical information processing apparatus 200 according to an embodiment of the present invention includes: the processing unit 202 is configured to perform word segmentation on a plurality of medical texts and perform clustering on the plurality of medical texts; the first determining unit 204 is configured to determine, according to words of every two medical texts in the medical texts of the same category, a degree of association between every two medical texts; the judging unit 206 is configured to judge whether words of any two medical texts in the medical texts of the same category have an association relationship according to the association degree of each two medical texts; and a storage unit 208, configured to, if the determination result is yes, associate and store the words having an association relationship.
In the technical scheme, the association degree of every two medical texts is determined according to the words in every two medical texts in the medical texts of the same category, whether an association relationship exists between any two words in the medical texts of the same category is judged according to the association degree of every two medical texts, and the words with the association relationship are stored in an association manner, for example, in a medical word stock, so as to construct a more perfect medical word stock. For example, the words in the a-medical text are: cold and fever, the words in the B medical text are: fever and cough, the words in the C medical text are: cough and cold, it can be seen that a and B have similar words: fever and fever, 30% correlation between a and B, with the same words in B and C: in the cough, the association degree between B and C is 50%, and A and C do not have the same or similar words, but because A and B have an association, the association between A and C can be determined, that is, the association between the words of A and C exists. Therefore, the method and the device can further dig out the words with the implicit association relationship, so that the words with the association relationship in the medical text can be more accurately and comprehensively duout. Furthermore, a search engine of medical treatment information can be constructed according to the words with the incidence relation, or automatic analysis of medical treatment text information is realized, and convenience is provided for outpatients doctors and patients to inquire diseases and symptoms.
Preferably, the plurality of medical texts may be electronic medical records in a medical system of a hospital, or may be obtained from a medical professional website by using a crawler program. Because the scale of the medical texts is larger, the distributed file system can store the medical texts.
In the above technical solution, preferably, the storage unit 208 includes: the second determining unit 2082, configured to determine association degrees of words in any two medical texts according to the association degrees of any two medical texts; the storage unit 208 is specifically configured to store the association degrees of the words in any two medical texts.
In the technical scheme, the association degree of the words in any two medical texts is determined according to the association degree of any two medical texts, specifically, the association degree of any two medical texts can be used as the association degree of the words in any two medical texts, and the association degree of the words in any two medical texts can be calculated according to a preset algorithm, so that the association degree of the words can be reflected more accurately and intuitively according to the association degree of the words. For example, the words in the a-medical text are: cold and fever, the words in the C medical text are: cough and coolness, the degree of association between a and C is 10%, and the degree of association between cold and cough is 10%.
In any of the above technical solutions, preferably, the processing unit 202 includes: the word segmentation unit 2022 is configured to segment words of the plurality of medical texts according to the dictionary and parts of speech of the words in the plurality of medical texts.
In the technical scheme, the words of the medical texts can be cut according to words and parts of speech in a dictionary (preferably a medical dictionary), specifically, the words of the medical texts are cut according to the words in the dictionary, if the words in the medical texts do not exist in the dictionary, whether the words are associated with front and rear words or not is judged according to the parts of speech of the words, and whether new words need to be combined or not is judged, so that the situations of word miscut and word omission are effectively avoided, and the accuracy and the comprehensiveness of word cutting are further ensured. Preferably, the words obtained by segmenting the medical text are medical words, so as to avoid interference of irrelevant words (such as every day, patients, home) in determining the relevance of the medical text.
In any of the above technical solutions, preferably, the processing unit 202 includes: a clustering unit 2024, configured to cluster the plurality of medical texts according to international disease classification and K-means algorithm.
In the technical scheme, the plurality of medical texts can be clustered according to International Classification of Disease (International Classification of Disease) and a K-means algorithm, and since the medical texts of the same category obtained by clustering have the same Disease, the possibility of association among words of the medical texts of the same category obtained by clustering is high, and then the medical texts of the same category are further processed to ensure the processing speed.
In any of the foregoing technical solutions, preferably, the storage unit 208 is specifically configured to store the words having an association relationship according to the attribute of the words having an association relationship.
In the technical scheme, the word is stored according to the attribute of the word with the association relationship, for example, the attribute of the word is as follows: the medical information storage system comprises body parts (such as heads and limbs), predicates (such as pains and strains), diseases (such as fever and heart diseases), medicines (such as Gregorian tablets and glucose injection), treatment means (such as drip and anesthesia), and neglected words (such as home and patient) which do not contribute to information extraction), so that the storage of related words is more orderly.
Fig. 3 shows a schematic diagram of a medical information processing apparatus according to an embodiment of the invention.
As shown in fig. 3, the medical information processing apparatus 300 first obtains a medical text from a medical professional website by using a crawler technology, and obtains an electronic medical record from a medical system in a hospital, and since the amounts of information obtained from the medical professional website and the medical system are large, the medical text and the electronic medical record obtained from the medical professional website are stored in a distributed file system as a plurality of medical texts, word segmentation and clustering are performed on the plurality of medical texts, and then the association degree of each two medical texts is calculated by using a Jacard method according to words in each two medical texts in the same category, for example, for two medical texts a and B, the word after word segmentation of a medical text is: "patient", "sore throat and itching throat", "no phlegm", "stomach distension", "lumbago", the words after the word segmentation of the B medical text are: "dry cough", "pharyngalgia and pharynx itch", "no phlegm", "stomachache", "waist soreness" and "fear of cold", exactly the same word pair can be obtained by calculation: "pharyngalgia pharynx itch" and "pharyngalgia pharynx itch", "no phlegm" and "no phlegm"; and the higher similarity terms are "gastrectasia" and "stomachache", "lumbago" and "soreness of waist". And then determining whether any two medical texts in the medical texts of the same category have an association relationship by adopting a vector cosine method, thereby obtaining the association relationship of some words, wherein the association relationship can not be obtained by calculating the similarity by adopting a Jacard method. For example, the two medical texts a and B and the other medical text C, C are the following words after word segmentation: the medical records A and C have an incidence relation through calculation, so that the words in the A and C have an incidence relation, for example, the words in the A and C have an incidence relation with the words in the tonsil inflammation, and then the words in the incidence relation are stored in a medical word stock, so that the medical word stock facing to a medical actual scene is constructed.
The technical scheme of the invention is explained in detail in the above with the help of the attached drawings, and by analyzing the real data (i.e. medical history) in the medical system of the hospital and the medical text in the medical professional website, words with association relation in the medical text can be more accurately and comprehensively excavated, so that a medical word stock facing to the medical actual scene is constructed.
The above description is only a preferred embodiment of the present invention and is not intended to limit the present invention, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.
Claims (8)
1. A medical information processing method characterized by comprising:
performing word segmentation on a plurality of medical texts, and clustering the plurality of medical texts;
determining the association degree of every two medical texts according to the words of every two medical texts in the medical texts of the same category;
judging whether words of any two medical texts in the medical texts of the same category have an association relation or not according to the association degree of every two medical texts;
if so, performing association storage on the words with the association relation;
the step of performing association storage on the words with association relations specifically includes:
and storing the words with the association relation according to the attributes of the words with the association relation.
2. The medical information processing method according to claim 1, wherein the step of storing the words having the association relationship in association further includes:
determining the association degree of words in any two medical texts according to the association degree of any two medical texts;
and storing the association degree of the words in any two medical texts.
3. The medical information processing method according to claim 1, wherein the step of segmenting the plurality of medical texts specifically includes:
and performing word segmentation on the medical texts according to the dictionary and the parts of speech of the words in the medical texts.
4. The medical information processing method according to claim 1, wherein the step of clustering the plurality of medical texts specifically includes:
clustering the plurality of medical texts according to international disease classification and K-means algorithm.
5. A medical information processing apparatus characterized by comprising:
the processing unit is used for segmenting a plurality of medical texts and clustering the medical texts;
the first determination unit is used for determining the association degree of every two medical texts according to the words of every two medical texts in the medical texts of the same category;
the judging unit is used for judging whether words of any two medical texts in the medical texts of the same category have an association relation or not according to the association degree of every two medical texts;
the storage unit is used for performing association storage on the words with the association relation when the judgment result is yes;
the storage unit is specifically configured to store the words with the association relationship according to the attributes of the words with the association relationship.
6. The medical information processing apparatus according to claim 5, wherein the storage unit includes:
the second determining unit is used for determining the association degree of the words in any two medical texts according to the association degree of any two medical texts;
the storage unit is specifically configured to store the association degrees of the words in any two medical texts.
7. The medical information processing apparatus according to claim 5, wherein the processing unit includes:
and the word cutting unit is used for cutting words of the medical texts according to the dictionary and the parts of speech of the words in the medical texts.
8. The medical information processing apparatus according to claim 5, wherein the processing unit includes:
and the clustering unit is used for clustering the plurality of medical texts according to the international disease classification and the K-means algorithm.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510886242.XA CN106844325B (en) | 2015-12-04 | 2015-12-04 | Medical information processing method and medical information processing apparatus |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510886242.XA CN106844325B (en) | 2015-12-04 | 2015-12-04 | Medical information processing method and medical information processing apparatus |
Publications (2)
Publication Number | Publication Date |
---|---|
CN106844325A CN106844325A (en) | 2017-06-13 |
CN106844325B true CN106844325B (en) | 2022-01-25 |
Family
ID=59150575
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201510886242.XA Active CN106844325B (en) | 2015-12-04 | 2015-12-04 | Medical information processing method and medical information processing apparatus |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106844325B (en) |
Families Citing this family (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110019826B (en) * | 2017-07-27 | 2023-02-28 | 北大医疗信息技术有限公司 | Construction method, construction device, equipment and storage medium of medical knowledge map |
CN109192258B (en) * | 2018-08-14 | 2023-06-20 | 深圳平安医疗健康科技服务有限公司 | Medical data conversion method, medical data conversion device, computer equipment and storage medium |
CN110766004B (en) * | 2019-10-23 | 2022-05-13 | 泰康保险集团股份有限公司 | Medical identification data processing method and device, electronic equipment and readable medium |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2005045695A1 (en) * | 2003-10-27 | 2005-05-19 | Educational Testing Service | Method and system for determining text coherence |
CN101079026A (en) * | 2007-07-02 | 2007-11-28 | 北京百问百答网络技术有限公司 | Text similarity, acceptation similarity calculating method and system and application system |
CN102982125A (en) * | 2012-11-14 | 2013-03-20 | 百度在线网络技术(北京)有限公司 | Method and device for identifying texts with same meaning |
CN103123618A (en) * | 2011-11-21 | 2013-05-29 | 北京新媒传信科技有限公司 | Text similarity obtaining method and device |
CN103942339A (en) * | 2014-05-08 | 2014-07-23 | 深圳市宜搜科技发展有限公司 | Synonym mining method and device |
CN104978347A (en) * | 2014-04-11 | 2015-10-14 | 中国中医科学院中医临床基础医学研究所 | Data mining method and data mining system for sensitive keywords in Chinese biomedical literature database |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH07319906A (en) * | 1994-05-27 | 1995-12-08 | Fujitsu Ltd | Synonym retrieving processing system and character string retrieving system |
-
2015
- 2015-12-04 CN CN201510886242.XA patent/CN106844325B/en active Active
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2005045695A1 (en) * | 2003-10-27 | 2005-05-19 | Educational Testing Service | Method and system for determining text coherence |
CN101079026A (en) * | 2007-07-02 | 2007-11-28 | 北京百问百答网络技术有限公司 | Text similarity, acceptation similarity calculating method and system and application system |
CN103123618A (en) * | 2011-11-21 | 2013-05-29 | 北京新媒传信科技有限公司 | Text similarity obtaining method and device |
CN102982125A (en) * | 2012-11-14 | 2013-03-20 | 百度在线网络技术(北京)有限公司 | Method and device for identifying texts with same meaning |
CN104978347A (en) * | 2014-04-11 | 2015-10-14 | 中国中医科学院中医临床基础医学研究所 | Data mining method and data mining system for sensitive keywords in Chinese biomedical literature database |
CN103942339A (en) * | 2014-05-08 | 2014-07-23 | 深圳市宜搜科技发展有限公司 | Synonym mining method and device |
Also Published As
Publication number | Publication date |
---|---|
CN106844325A (en) | 2017-06-13 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107610779B (en) | Disease evaluation and disease risk evaluation method and device | |
Halmin et al. | Epidemiology of massive transfusion: a binational study from Sweden and Denmark | |
US9558264B2 (en) | Identifying and displaying relationships between candidate answers | |
CN100449531C (en) | Patient data mining | |
US20140344274A1 (en) | Information structuring system | |
CN104572675B (en) | A kind of system and method for similar case history retrieval | |
WO2018076243A1 (en) | Search method and device | |
CN111465990B (en) | Method and system for clinical trials of healthcare | |
US20180068076A1 (en) | Systems and methods for semantic search and extraction of related concepts from clinical documents | |
US10423758B2 (en) | Computer system and information processing method | |
CN104199855B (en) | A kind of searching system and method for traditional Chinese medicine and pharmacy information | |
US20150073830A1 (en) | Electrical Computing Devices for Recruiting a Patient Population for a Clinical Trial | |
CN114817386A (en) | Method and device for generating structured medical data | |
US11901048B2 (en) | Semantic search for a health information exchange | |
CN112885478B (en) | Medical document retrieval method, medical document retrieval device, electronic device and storage medium | |
CN106844325B (en) | Medical information processing method and medical information processing apparatus | |
CN112883157A (en) | Method and device for standardizing multi-source heterogeneous medical data | |
WO2021151302A1 (en) | Drug quality-control analysis method, apparatus, device, and medium based on machine learning | |
Si et al. | An OMOP CDM-based relational database of clinical research eligibility criteria | |
CN111061835B (en) | Query method and device, electronic equipment and computer readable storage medium | |
US20100306183A1 (en) | Electronic system for a social -network web portal applied to the sector of health and health information | |
CN112115697A (en) | Method, device, server and storage medium for determining target text | |
CN113064960A (en) | Method for accurately searching cases similar to patient's condition | |
CN106354715A (en) | Method and device for medical word processing | |
JP6375064B2 (en) | System and method for uniformly correlating unstructured item features with related therapy features |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
PP01 | Preservation of patent right |
Effective date of registration: 20240202 Granted publication date: 20220125 |
|
PP01 | Preservation of patent right |