CN115545017A - Medical term normalization method and system based on node similarity - Google Patents

Medical term normalization method and system based on node similarity Download PDF

Info

Publication number
CN115545017A
CN115545017A CN202211259564.8A CN202211259564A CN115545017A CN 115545017 A CN115545017 A CN 115545017A CN 202211259564 A CN202211259564 A CN 202211259564A CN 115545017 A CN115545017 A CN 115545017A
Authority
CN
China
Prior art keywords
entity
attribute
word
normalization
target
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211259564.8A
Other languages
Chinese (zh)
Inventor
李宇萱
李向阳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shandong Health Medical Big Data Co ltd
Shandong Langchao Intelligent Medical Technology Co ltd
Original Assignee
Shandong Health Medical Big Data Co ltd
Shandong Langchao Intelligent Medical Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shandong Health Medical Big Data Co ltd, Shandong Langchao Intelligent Medical Technology Co ltd filed Critical Shandong Health Medical Big Data Co ltd
Priority to CN202211259564.8A priority Critical patent/CN115545017A/en
Publication of CN115545017A publication Critical patent/CN115545017A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/237Lexical tools
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computing arrangements using knowledge-based models
    • G06N5/02Knowledge representation; Symbolic representation
    • G06N5/022Knowledge engineering; Knowledge acquisition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computing arrangements using knowledge-based models
    • G06N5/02Knowledge representation; Symbolic representation
    • G06N5/027Frames

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • General Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Health & Medical Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a medical term normalization method and system based on node similarity, belongs to the technical field of data processing, and aims to solve the technical problem of how to effectively solve the problems of inter-entity coreference resolution and entity disambiguation and quickly and accurately finish the medical term normalization. The method comprises the following steps: acquiring a medical term normalization word as an entity normalization word, and labeling an entity type for each entity normalization word; taking the entity normalization word, the corresponding entity attribute, the attribute type, the entity type and the relationship type between the entity normalization word and the entity attribute as a word group, and taking the word group corresponding to each entity normalization word as a knowledge construction knowledge base; and acquiring entity attributes corresponding to the target entity and the relationship types between the target entity and the entity attributes corresponding to the target entity, and calculating the node similarity between the entity attributes of the target entity and the entity attributes of the normalization word entities in the knowledge base.

Description

Medical term normalization method and system based on node similarity
Technical Field
The invention relates to the technical field of data processing, in particular to a medical term normalization method and system based on node similarity.
Background
In the medical field, commonly used professional terms include names of hospital departments and departments, commonly used words for diagnosis and treatment, commonly used names of diseases, commonly used operation names, commonly used medicine names, commonly used clinical medicine terms, commonly used laboratory examinations and examinations, and the like, different attribute relationships may exist among different types of medical terms with different names or the same name, such as adapted groups, adapted certificates, used medicines, examinations and examinations to be performed, and the like, and the change of entity words may be caused by the addition, deletion and modification of any attribute condition. According to the attribute conditions and the category labels, the different entity words can be reasonably and accurately classified and judged.
At present, as the informatization of medical institutions is deepened, the requirements of various medical term noun standards based on practical application are increasing. With the development of medicine, the distinction of each specialized field is more detailed, and information exchange, sharing, integration and utilization among medical institutions and departments cannot be agreed due to different data structures and expression modes. And due to regional difference and serious colloquization, the phenomenon of synonymy of multiple words or polysemy of one word exists, so that the research and analysis work of medical information is greatly influenced. At present, the term normalization work in the medical field is slow in progress, mostly medical professionals carry out manual comparison treatment, a large amount of manpower and energy are consumed, the time is long, the efficiency is low, communication and communication are difficult due to the influence of wide regions and regional differences, and a unified term normalization system is difficult to achieve.
How to effectively solve the problems of coreference resolution and entity disambiguation among entities and how to quickly and accurately complete the normalization of medical terms is a technical problem to be solved urgently at present.
Disclosure of Invention
The technical task of the invention is to provide a medical term normalization method and system based on node similarity aiming at the defects, so as to solve the problems of inter-entity coreference resolution and entity disambiguation effectively and quickly and accurately finish the technical problem of medical term normalization.
In a first aspect, the invention relates to a medical term normalization method based on node similarity, which is characterized by comprising the following steps:
acquiring a medical term normalization word as an entity normalization word, and labeling an entity type for each entity normalization word;
for each entity normalization word, acquiring all entity attributes of the entity normalization word and attribute types corresponding to the entity attributes, taking the entity normalization word and the corresponding entity attributes, attribute types, entity types and relationship types between the entity normalization word and the entity attributes as a phrase, and taking the phrase corresponding to each entity normalization word as a knowledge construction knowledge base;
for a target entity to be normalized, acquiring entity attributes corresponding to the target entity and the relationship types between the target entity and the entity attributes corresponding to the target entity, and calculating the node similarity between the entity attributes of the target entity and the entity attributes of each entry word entity in a knowledge base;
selecting a normalization word entity with the node similarity to the target entity attribute larger than a threshold value as a matching normalization entity, and if a plurality of matching normalization entities exist, selecting an entity normalization word as a normalization word entity corresponding to the target entity in a manual judgment mode.
Preferably, if the entity type corresponding to the target entity is known, an entity normalization word of the same entity type as the target entity is screened from a knowledge base, and the node similarity between the entity attribute of the target entity and the entity attribute of the entity normalization word is calculated based on the screened entity normalization word.
Preferably, if the number of the entity attributes corresponding to the entity normalization word is more than a threshold value, selecting the entity attributes with the same type as the relationship between the target entity and the entity attributes corresponding to the target entity based on the type of the relationship between the entity normalization word and the entity attributes corresponding to the entity normalization word, and calculating the node similarity between the entity attributes of the target entity and the entity attributes of the entity normalization word based on the selected entity attributes;
and if the number of the entity attributes corresponding to the target entity is more than the threshold value, selecting N entity attributes which account for the front row in the knowledge base as the entity attributes of the target entity, and calculating the node similarity between the entity attributes of the target entity and the entity attributes of the entity normalization word based on the selected entity attributes.
Preferably, if the node similarity between the entity attribute of the target entity and the entity attribute of each entry word entity in the knowledge base is zero, it is determined that no entry word of the target entity exists in the knowledge base, after the target entity is manually verified, the target entity, the corresponding entity type, the entity attribute, the attribute category, and the relationship type between the target entity and the corresponding entity attribute are used as a phrase, and the phrase is used as a new knowledge and updated to the knowledge base.
Preferably, the node similarity between the entity attribute of the target entity and the entity attribute of each entry word entity in the knowledge base is calculated by the following method: matching the entity attribute of the target entity with the entity attribute of the entity normalization word one by one based on the attribute value and the attribute type of the entity attribute, and taking the entity attribute with the same attribute value and attribute type as the matched entity attribute;
correspondingly, the node similarity calculation formula is as follows:
Ji=S0∩Si/S0∪Si=S0∩Si/(S0+Si-S0∩Si)
wherein Ji represents the node similarity value of the target entity and the ith normalization word entity, S0 represents the entity attribute set of the target entity, and Si represents the entity attribute set of the ith normalization word entity.
In a second aspect, the present invention provides a node similarity-based medical term normalization system for normalizing target entities by the node similarity-based medical term normalization method according to any one of the first aspect, the system comprising:
the data acquisition module is used for acquiring a medical term normalization word as an entity normalization word and labeling the entity type of each entity normalization word;
the knowledge base construction module is used for acquiring all entity attributes of each entity normalized word and attribute types corresponding to the entity attributes for each entity normalized word, using the entity normalized word and the corresponding entity attributes, attribute types, entity types and relationship types between the entity normalized word and the entity attributes as a word group, and using the word group corresponding to each entity normalized word as a knowledge construction knowledge base;
the normalized entity matching module is used for acquiring entity attributes corresponding to the target entities and the relationship types between the target entities and the entity attributes corresponding to the target entities, and calculating the node similarity between the entity attributes of the target entities and the entity attributes of all the normalized entities in the knowledge base;
and the normalized entity selection module is used for selecting a normalized word entity with the node similarity to the target entity attribute larger than a threshold value as a matched normalized entity, and if a plurality of matched normalized entities exist, selecting an entity normalization word as a normalization word entity corresponding to the target entity in a manual judgment mode.
Preferably, if the entity type corresponding to the target entity is known, the normalized entity matching module is configured to screen out an entity normalization word of the same entity type as the target entity from a knowledge base, and calculate a node similarity between the entity attribute of the target entity and the entity attribute of the entity normalization word based on the screened entity normalization word.
Preferably, if the number of entity attributes corresponding to the entity normalization word is greater than the threshold, the normalized entity matching module is configured to perform: selecting entity attributes which are the same as the relationship types between the target entity and the entity attributes corresponding to the target entity based on the relationship types between the entity normalization word and the entity attributes corresponding to the entity normalization word, and calculating the node similarity between the entity attributes of the target entity and the entity attributes of the entity normalization word based on the selected entity attributes;
if the number of the entity attributes corresponding to the target entity is more than the threshold value, the normalized entity matching module is used for executing: selecting N entity attributes occupying the front row in the knowledge base as entity attributes of the target entity, and calculating the node similarity between the entity attributes of the target entity and the entity attributes of the entity normalization word based on the selected entity attributes.
Preferably, if the node similarity between the entity attribute of the target entity and the entity attribute of each entry word entity in the knowledge base is zero, the normalized entity matching module is configured to perform: judging that the normalization word of the target entity does not exist in the knowledge base, and calling a knowledge base construction module after the target entity is manually checked;
correspondingly, the knowledge base construction module is used for executing: and taking the target entity and the corresponding entity type, entity attribute, attribute category and relationship type between the target entity and the corresponding entity attribute as a phrase, and updating the phrase serving as a new knowledge to a knowledge base.
Preferably, the normalized entity matching module calculates the node similarity between the entity attribute of the target entity and the entity attribute of each entry word entity in the knowledge base by the following method: matching the entity attribute of the target entity with the entity attribute of the entity normalization word one by one based on the attribute value and the attribute type of the entity attribute, and taking the entity attribute with the same attribute value and attribute type as the matched entity attribute;
the node similarity calculation formula is as follows:
Ji=S0∩Si/S0∪Si=S0∩Si/(S0+Si-S0∩Si)
wherein Ji represents the node similarity value of the target entity and the ith normalization word entity, S0 represents the entity attribute set of the target entity, and Si represents the entity attribute set of the ith normalization word entity.
The medical term normalization method and system based on the node similarity have the following advantages:
1. establishing a knowledge base, aligning the entity of the normalization word with the corresponding entity attribute through the knowledge base, then adjusting and optimizing attribute parameters according to the quantity of each attribute value, then comparing a target entity attribute group with the entity attribute group of the normalization word based on node similarity calculation, and selecting the entity word with the highest node similarity as the normalization word of the target entity, thereby improving the normalization accuracy;
2. for a target entity of a known type, entity normalization words of the same type as the target entity are screened from a knowledge base, for the screened normalization words, the node similarity of a target entity attribute set and a normalization word entity attribute set is calculated respectively, and a specific category of diseases, medicines, operations, inspection, examination and the like is selected selectively according to the entity type to carry out target entity normalization.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings used in the embodiments or the prior art descriptions will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings without creative efforts.
The invention is further described below with reference to the accompanying drawings.
FIG. 1 is a block flow diagram of a medical term normalization method based on node similarity according to example 1;
fig. 2 is a case flow diagram of the medical term normalization method based on node similarity according to embodiment 1.
Detailed Description
The present invention is further described below with reference to the accompanying drawings and specific embodiments so that those skilled in the art can better understand the present invention and can implement the present invention, but the embodiments are not intended to limit the present invention, and the embodiments and technical features of the embodiments can be combined with each other without conflict.
The embodiment of the invention provides a medical term normalization method and system based on node similarity, which are used for solving the problems of inter-entity coreference resolution and entity disambiguation effectively and quickly and accurately completing the technical problem of medical term normalization.
Example 1:
the invention discloses a medical term normalization method based on node similarity, which comprises four steps of medical term normalization word acquisition, knowledge base construction, normalization word matching and normalization word selection based on a matching result, wherein the four steps are shown in figure 1 and specifically as follows.
S100, acquiring a medical term normalization word as an entity normalization word, and labeling an entity type for each entity normalization word.
Step S100 of the present embodiment is a term collection for medical term normalization. Relevant knowledge data are obtained by collecting authoritative knowledge such as a national standard data set, a journal, a national published medical relevant knowledge document or literature and the like, and data structuring processing is carried out, wherein at present, 18 knowledge sources such as disease classification and code (revision) 1.3 nationwide, national clinical version of disease classification and code 2.0, common clinical medical nouns, clinical examination item catalogues of medical institutions (2014 edition), national medical and health institution business department classification and codes, surgical national standard 3.0, national drug administration, clinical medical routes, traditional Chinese medicine syndrome names and classification codes, traditional Chinese medicine disease classification and codes, medical insurance catalogues and the like are added into a knowledge base for carrying out standard word organization of the surgical terms.
In this embodiment, the disease is regarded as an entity, for example, the term (disease name) of the entity is common cold, and the corresponding attributes of the entity include race, runny nose, sneezing, fever and headache; the term "entity-entry" (disease name) refers to pneumonia, and the corresponding entity attributes include nasal obstruction, rhinorrhea, sneezing, fever, general weakness, diarrhea, and vomiting.
S200, for each entity normalization word, acquiring all entity attributes of the entity normalization word and attribute types corresponding to the entity attributes, taking the entity normalization word and the corresponding entity attributes, attribute types, entity types and relationship types between the entity normalization word and the entity attributes as a word group, and taking the word group corresponding to each entity normalization word as a knowledge building knowledge base.
Step S200 of the present embodiment is constructed for the knowledge base. Each entity normalization word corresponds to a plurality of entity attributes, the plurality of entity attributes corresponding to each entity normalization word form an entity attribute set, and each entity attribute corresponds to an attribute category. For each entity, the relationship type between the entity and the entity attribute comprises symptoms, medication, adaptive crowd and the like.
The step stores the entity name and its attribute in the form of tuple of "entity name-type tag-relation type-entity attribute-type tag", where "entity name" is the initial entity and "entity attribute" is one of attributes of terminating entity, i.e. the term of entity normalization, as shown in the following table.
TABLE 1 glossary of relationships
Figure BDA0003890932250000071
Figure BDA0003890932250000081
S300, for the target entity to be normalized, acquiring the entity attribute corresponding to the target entity and the relationship type between the target entity and the entity attribute corresponding to the target entity, and calculating the node similarity between the entity attribute of the target entity and the entity attribute of each entry word entity in the knowledge base.
Step 300 is an entry matching. In the step, based on the attribute value of the entity attribute and the attribute type of the entity attribute, the entity attribute of the target entity and the entity attribute of the entity normalization word are matched one by one, and the entity attribute with the same attribute value and attribute type is used as the matched entity attribute. Correspondingly, the node similarity calculation formula is as follows:
Ji=S0∩Si/S0∪Si=S0∩Si/(S0+Si-S0∩Si)
wherein Ji represents the node similarity value of the target entity and the ith normalization word entity, S0 represents the entity attribute set of the target entity, and Si represents the entity attribute set of the ith normalization word entity.
In the calculation process, if the entity type corresponding to the target entity is known, an entity normalization word of the same entity type as the target entity is screened from a knowledge base, and the node similarity between the entity attribute of the target entity and the entity attribute of the entity normalization word is calculated based on the screened entity normalization word.
And if the number of the entity attributes corresponding to the entity normalizing word is more than the threshold value, selecting the entity attributes with the same relationship type as the target entity and the entity attributes corresponding to the target entity based on the relationship type between the entity normalizing word and the entity attributes corresponding to the entity normalizing word, and calculating the node similarity between the entity attributes of the target entity and the entity attributes of the entity normalizing word based on the selected entity attributes.
And if the number of the entity attributes corresponding to the target entity is more than the threshold value, selecting N entity attributes occupying the front row in the knowledge base as the entity attributes of the target entity, and calculating the node similarity between the entity attributes of the target entity and the entity attributes of the entity normalization word based on the selected entity attributes.
And if the node similarity between the entity attribute of the target entity and the entity attribute of each entry word entity in the knowledge base is zero, judging that no entry word of the target entity exists in the knowledge base, after the target entity is manually checked, taking the target entity, the corresponding entity type, the entity attribute, the attribute category and the relationship type between the target entity and the corresponding entity attribute as a phrase, and taking the phrase as a new knowledge to be updated to the knowledge base.
S400, selecting a normalization word entity with the node similarity of the target entity attribute larger than a threshold value as a matching normalization entity, and if the number of the matching normalization entities is multiple, selecting an entity normalization word as a normalization word entity corresponding to the target entity in a manual judgment mode.
As shown in the example of fig. 2, the target entity is "coronary heart disease", the term "hypertension" is classified as entity X1, and the term "coronary atherosclerotic heart disease" is classified as entity X2:
determining the attribute set S0 of the target entity X0:
x0: coronary heart disease
S0: { "symptom": "chest pain", "symptoms": "chest tightness", "symptoms": "chest compression", "medication": "aspirin", "medication": "ACEL", "medication": "beta blocker" }
Determining a set of attributes S1, S2 for each entry term entity X1, X2
X1: hypertension (hypertension)
S1: { "symptom": "blood pressure rise", "symptoms": "chest distress", "symptoms": "dizziness", "medication": "diuretics", "medications": "ACEL", "medication": "beta blocker" }
X2: coronary atherosclerotic heart disease
S2: { "symptom": "chest pain", "symptoms": "chest distress", "correlation check": "electrocardiogram", "medication": "aspirin", "medication": "ACEL", "medication": "beta blocker" }
Respectively calculating the intersection of the target entity attribute set S0 and the attribute sets S1 and S2 of the normalization words to be first sets T11 and T12; the union (S0 + Si-T1 i) of the target entity attribute set S0 and the attribute sets S1 and S2 of the normalization words is a second set T21 and T22;
intersection of S0 and S1: t11: { "symptom": chest pain and medication: "ACEL", "medication": "beta blocker" }
T11=S0∩S1=3
Intersection of S0 and S2: t12: { "symptom": "chest pain", "symptoms": chest distress and medication: "aspirin", "medication": "ACEL", "medication": "beta blocker" }
T12=S0∩S2=5
Union of S0 and S1: t21: { "symptom": "chest pain", "symptoms": "chest distress", "symptoms": chest compression and medication: "aspirin", "medication": "ACEL", "medication": "beta blockers", "symptoms": "blood pressure rise", "symptoms": "dizziness", "medication": "diuretics" }
T11=S0∩S1=9
Intersection of S0 and S2: t22: { "symptom": "chest pain", "symptoms": chest distress and medication: "aspirin", "medication": "ACEL", "medication": "beta blocker", "correlation check": "Electrocardiogram" }
T12=S0∩S2=7
T11/T21 is the node similarity between the target entity X0 and the normalized entity X1, i.e., ji = S0 ═ Si/S0 ═ Si = S0 ═ Si/(S0 + Si-S0 ∞ Si), i.e., the node similarity J1, J2 between the target entity X0 and each normalized entity X1, X2 is obtained.
Similarity of nodes between coronary heart disease and hypertension: j1= T11/T21=3/9
The node similarity between coronary heart disease and coronary atherosclerotic heart disease is as follows: j2= T12/T22=5/7
The similarity of each node is compared, and the larger the similarity of the nodes is, the closer to the target entity to the word
J1< J2, i.e. "coronary heart disease", is referred to as "coronary atherosclerotic heart disease".
It can be said that the value range of Ji is [0,1], if J is 1, it means that the similarity of the two nodes is 1, the attributes of the two are completely the same, it can be determined that the normalized word entity Xi is the normalized word of the target entity X0, if the result of the similarity of the target entity attribute group and the nodes of all normalized word entity attribute groups in the knowledge base is 0, it is determined that the normalized word of the target entity does not exist in the knowledge base, after the target entity is manually checked, the target entity is used as a new entity normalized word, the new entity normalized word and entity type, the corresponding entity attribute and attribute type, and the relationship between the entity and attribute are used as a word group, and the word group is used as a new knowledge and updated to the knowledge base.
Example 2:
the invention relates to a medical term normalization system based on node similarity, which comprises a data acquisition module, a knowledge base construction module, a normalization entity matching module and a normalization entity selection module, and can execute the method disclosed in embodiment 1 to normalize a target entity.
The data acquisition module is used for acquiring the medical term normalization word as an entity normalization word and labeling the entity type of each entity normalization word.
In this embodiment, the data acquisition module is configured to acquire relevant knowledge data from authority knowledge such as a standard data set, a journal, a medical relevant knowledge document or literature issued by a country, and perform data structuring processing, where 18 knowledge sources such as "disease classification and code (revision)" 1.3 nationwide, disease classification and code national clinical edition 2.0, commonly used clinical medical terms, a medical institution clinical examination item catalog (2014), a national medical health institution business department classification and code, an operation national standard 3.0, a national drug administration, a clinical medical route, a Chinese medicine syndrome name and classification code, a Chinese medicine disorder classification and code, and a medical insurance catalog are added to a current knowledge base to perform the normalization of the term.
Diseases are treated as entities, for example, entities are grouped together (disease names) as common cold, and corresponding entity attributes include race, runny nose, sneezing, fever, and headache; the term "entity attribute" (disease name) refers to pneumonia, and the corresponding entity attributes include nasal obstruction, rhinorrhea, sneezing, fever, general weakness, diarrhea, and vomiting.
For each entity normalization word, the knowledge base construction module is used for acquiring all entity attributes of the entity normalization word and attribute types corresponding to the entity attributes, using the entity normalization word and the corresponding entity attributes, attribute types, entity types and relationship types between the entity normalization word and the entity attributes as a word group, and using the word group corresponding to each entity normalization word as a knowledge construction knowledge base.
In this embodiment, each entity normalization word corresponds to a plurality of entity attributes, the plurality of entity attributes corresponding to each entity normalization word form an entity attribute set, and each entity attribute corresponds to an attribute category. For each entity, the relationship type between the entity and the entity attribute comprises symptoms, medication, adaptive crowd and the like. .
In the specific implementation process, the entity name and the entity attribute thereof are subjected to data storage in the form of an entity name-type tag-relation type-entity attribute-type tag tuple, wherein the entity name is a starting entity, and the entity attribute is one of attributes of a terminating entity, namely an entity entry.
And for the target entity to be normalized, the normalized entity matching module is used for acquiring the entity attribute corresponding to the target entity and the relationship type between the target entity and the entity attribute corresponding to the target entity, and calculating the node similarity between the entity attribute of the target entity and the entity attribute of each normalized entity in the knowledge base.
In this embodiment, the normalized entity matching module calculates the node similarity between the entity attribute of the target entity and the entity attribute of each entry word entity in the knowledge base by the following method: matching the entity attribute of the target entity with the entity attribute of the entity normalization word one by one based on the attribute value and the attribute type of the entity attribute, and taking the entity attribute with the same attribute value and attribute type as the matched entity attribute;
the node similarity calculation formula is as follows:
Ji=S0∩Si/S0∪Si=S0∩Si/(S0+Si-S0∩Si)
wherein Ji represents the node similarity value of the target entity and the ith normalization word entity, S0 represents the entity attribute set of the target entity, and Si represents the entity attribute set of the ith normalization word entity.
In the calculation process, the normalized entity matching module performs the following:
if the entity type corresponding to the target entity is known, the normalization entity matching module is used for screening out an entity normalization word of the same entity type as the target entity from a knowledge base, and calculating the node similarity between the entity attribute of the target entity and the entity attribute of the entity normalization word based on the screened entity normalization word;
if the number of the entity attributes corresponding to the entity normalization word is more than the threshold value, the normalized entity matching module is used for executing the following steps: selecting entity attributes of which the types are the same as those of the target entity and the entity attributes corresponding to the target entity based on the relationship types between the entity normalizing word and the entity attributes corresponding to the entity normalizing word, and calculating the node similarity between the entity attributes of the target entity and the entity attributes of the entity normalizing word based on the selected entity attributes;
if the number of the entity attributes corresponding to the target entity is more than the threshold value, the normalized entity matching module is used for executing the following steps: selecting N entity attributes occupying the front row in a knowledge base as entity attributes of a target entity, and calculating the node similarity between the entity attributes of the target entity and the entity attributes of the entity normalization word based on the selected entity attributes;
if the node similarity between the entity attribute of the target entity and the entity attribute of each entry word entity in the knowledge base is zero, the normalized entity matching module is used for executing: and judging that the normalization word of the target entity does not exist in the knowledge base, and calling a knowledge base construction module after the target entity is manually checked.
Correspondingly, the knowledge base building module is used for executing: and taking the target entity and the corresponding entity type, entity attribute, attribute category and relationship type between the target entity and the corresponding entity attribute as a phrase, and updating the phrase serving as a new knowledge to a knowledge base.
The normalization entity selection module is used for selecting a normalization word entity with the node similarity of the target entity attribute larger than a threshold value as a matching normalization entity, and if the number of the matching normalization entities is multiple, selecting an entity normalization word as a normalization word entity corresponding to the target entity in a manual judgment mode.
While the invention has been shown and described in detail in the drawings and in the preferred embodiments, it is not intended to limit the invention to the embodiments disclosed, and it will be apparent to those skilled in the art that various combinations of the code auditing means in the various embodiments described above may be used to obtain further embodiments of the invention, which are also within the scope of the invention.

Claims (10)

1. A medical term normalization method based on node similarity is characterized by comprising the following steps:
acquiring a medical term normalization word as an entity normalization word, and labeling an entity type for each entity normalization word;
for each entity normalization word, acquiring all entity attributes of the entity normalization word and attribute types corresponding to the entity attributes, taking the entity normalization word and the corresponding entity attributes, attribute types, entity types and relationship types between the entity normalization word and the entity attributes as a phrase, and taking the phrase corresponding to each entity normalization word as a knowledge construction knowledge base;
for a target entity to be normalized, acquiring entity attributes corresponding to the target entity and a relationship type between the target entity and the entity attributes corresponding to the target entity, and calculating node similarity between the entity attributes of the target entity and the entity attributes of each entry word entity in a knowledge base;
selecting a normalization word entity with the node similarity to the target entity attribute larger than a threshold value as a matching normalization entity, and if a plurality of matching normalization entities exist, selecting an entity normalization word as a normalization word entity corresponding to the target entity in a manual judgment mode.
2. The node similarity-based medical term normalization method according to claim 1, wherein if the entity type corresponding to the target entity is known, an entity normalization word of the same entity type as the target entity is screened from the knowledge base, and the node similarity between the entity attribute of the target entity and the entity attribute of the entity normalization word is calculated based on the screened entity normalization word.
3. The node similarity-based medical term normalization method according to claim 1, wherein if the number of entity attributes corresponding to the entity normalization word is more than a threshold, node similarity between the entity attributes of the target entity and the entity attributes of the entity normalization word is calculated based on the selected entity attributes and the selected entity attributes which are the same as the relationship types between the target entity and the entity attributes corresponding to the target entity;
and if the number of the entity attributes corresponding to the target entity is more than the threshold value, selecting N entity attributes which account for the front row in the knowledge base as the entity attributes of the target entity, and calculating the node similarity between the entity attributes of the target entity and the entity attributes of the entity normalization word based on the selected entity attributes.
4. The node similarity-based medical term normalization method according to claim 1, wherein if the node similarity between the entity attribute of the target entity and the entity attribute of each entry word entity in the knowledge base is zero, it is determined that no entry word of the target entity exists in the knowledge base, after the target entity is manually checked, the target entity and the corresponding entity type, entity attribute, attribute category and relationship type between the target entity and the entity attribute corresponding thereto are used as a phrase, and the phrase is updated to the knowledge base as a new piece of knowledge.
5. The node similarity-based medical term normalization method according to claim 1, wherein the node similarity between the entity attribute of the target entity and the entity attribute of each of the normalization word entities in the knowledge base is calculated by: based on the attribute value of the entity attribute and the attribute type of the entity attribute, matching the entity attribute of the target entity with the entity attribute of the entity normalization word one by one, and taking the entity attribute with the same attribute value and attribute type as the matched entity attribute;
correspondingly, the calculation formula of the node similarity is as follows:
Ji=S0∩Si/S0∪Si=S0∩Si/(S0+Si-S0∩Si)
wherein Ji represents the node similarity value of the target entity and the ith normalization word entity, S0 represents the entity attribute set of the target entity, and Si represents the entity attribute set of the ith normalization word entity.
6. A node similarity based medical term normalization system for normalizing target entities by the node similarity based medical term normalization method according to any one of claims 1 to 5, the system comprising:
the data acquisition module is used for acquiring a medical term normalization word as an entity normalization word and labeling an entity type for each entity normalization word;
the knowledge base construction module is used for acquiring all entity attributes of each entity normalized word and attribute types corresponding to the entity attributes for each entity normalized word, using the entity normalized word and the corresponding entity attributes, attribute types, entity types and relationship types between the entity normalized word and the entity attributes as a word group, and using the word group corresponding to each entity normalized word as a knowledge construction knowledge base;
the normalized entity matching module is used for acquiring entity attributes corresponding to the target entities and the relationship types between the target entities and the entity attributes corresponding to the target entities, and calculating the node similarity between the entity attributes of the target entities and the entity attributes of all the normalized entities in the knowledge base;
and the normalized entity selection module is used for selecting a normalized word entity with the node similarity to the target entity attribute larger than a threshold value as a matched normalized entity, and if a plurality of matched normalized entities exist, selecting an entity normalization word as a normalization word entity corresponding to the target entity in a manual judgment mode.
7. The system for normalizing medical terms based on node similarity according to claim 6, wherein if the entity type corresponding to the target entity is known, the normalized entity matching module is configured to filter out an entity normalization word of the same entity type as the target entity from the knowledge base, and calculate the node similarity between the entity attribute of the target entity and the entity attribute of the entity normalization word based on the filtered entity normalization word.
8. The node similarity-based medical term normalization system of claim 6, wherein the normalized entity matching module is configured to perform, if the number of entity attributes corresponding to the entity normalization word is more than a threshold value: selecting entity attributes of which the types are the same as those of the target entity and the entity attributes corresponding to the target entity based on the relationship types between the entity normalizing word and the entity attributes corresponding to the entity normalizing word, and calculating the node similarity between the entity attributes of the target entity and the entity attributes of the entity normalizing word based on the selected entity attributes;
if the number of the entity attributes corresponding to the target entity is more than the threshold value, the normalized entity matching module is used for executing: selecting N entity attributes which account for the front row in a knowledge base as entity attributes of a target entity, and calculating the node similarity between the entity attributes of the target entity and the entity attributes of the entity normalization word based on the selected entity attributes.
9. The node-similarity-based medical term normalization system of claim 6, wherein the normalized entity matching module is configured to perform, if the node similarity between the entity attribute of the target entity and the entity attribute of each entry word entity in the knowledge base is zero: judging that the normalization word of the target entity does not exist in the knowledge base, and calling a knowledge base construction module after the target entity is manually checked;
correspondingly, the knowledge base construction module is used for executing: and taking the target entity and the corresponding entity type, entity attribute, attribute category and the relationship type between the target entity and the corresponding entity attribute as a phrase, and taking the phrase as a new piece of knowledge to update to a knowledge base.
10. The node-similarity-based medical term normalization system according to claim 6, wherein the normalized entity matching module calculates the node similarity between the entity attribute of the target entity and the entity attribute of each entry word entity in the knowledge base by: matching the entity attribute of the target entity with the entity attribute of the entity normalization word one by one based on the attribute value and the attribute type of the entity attribute, and taking the entity attribute with the same attribute value and attribute type as the matched entity attribute;
the node similarity calculation formula is as follows:
Ji=S0∩Si/S0∪Si=S0∩Si/(S0+Si-S0∩Si)
wherein Ji represents the node similarity value of the target entity and the ith normalization word entity, S0 represents the entity attribute set of the target entity, and Si represents the entity attribute set of the ith normalization word entity.
CN202211259564.8A 2022-10-14 2022-10-14 Medical term normalization method and system based on node similarity Pending CN115545017A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211259564.8A CN115545017A (en) 2022-10-14 2022-10-14 Medical term normalization method and system based on node similarity

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211259564.8A CN115545017A (en) 2022-10-14 2022-10-14 Medical term normalization method and system based on node similarity

Publications (1)

Publication Number Publication Date
CN115545017A true CN115545017A (en) 2022-12-30

Family

ID=84733944

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211259564.8A Pending CN115545017A (en) 2022-10-14 2022-10-14 Medical term normalization method and system based on node similarity

Country Status (1)

Country Link
CN (1) CN115545017A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116910250A (en) * 2023-06-28 2023-10-20 北京百度网讯科技有限公司 Knowledge processing method, knowledge processing device, electronic equipment and storage medium

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116910250A (en) * 2023-06-28 2023-10-20 北京百度网讯科技有限公司 Knowledge processing method, knowledge processing device, electronic equipment and storage medium

Similar Documents

Publication Publication Date Title
Kivimäki et al. Validity of cardiovascular disease event ascertainment using linkage to UK hospital records
Rowe et al. Social class, ethnicity and attendance for antenatal care in the United Kingdom: a systematic review
Williams et al. Hospital episode statistics: time for clinicians to get involved?
US20080195600A1 (en) Efficient method and process to search structured and unstructured patient data to match patients to clinical drug/device trials
CN110246572B (en) Word vector-based medical triage method and system
WO2021248694A1 (en) Report interpretation method and system for structural variations in sample data of patient
JP2005508557A (en) Patient data mining
US20070208514A1 (en) Method of Preparing Disease Prognosis Model, Disease Prognosis Prediction Method using this Model, Prognosis Prediction Device Based on this Model, and Program for Performing the Device and Storage Medium Wherein Said Program is Stored
CN115545017A (en) Medical term normalization method and system based on node similarity
CN109508336A (en) Search method, storage medium and computer equipment based on medical resource factbase
George et al. Comparison of surgeon assessment to frailty measurement in abdominal aortic aneurysm repair
Rousseau et al. Can automated retrieval of data from emergency department physician notes enhance the imaging order entry process?
CN111341458B (en) Single-gene disease name recommendation method and system based on multi-level structure similarity
CN111061835B (en) Query method and device, electronic equipment and computer readable storage medium
CN101923601A (en) System and method for actively providing health information
Leonard et al. Realization of a universal patient identifier for electronic medical records through biometric technology
CN109522331B (en) Individual-centered regionalized multi-dimensional health data processing method and medium
Chattopadhyay et al. Suicidal risk evaluation using a similarity-based classifier
Mostafiz et al. Pathology extraction from chest X-ray radiology reports: A performance study
US7702526B2 (en) Assessment of episodes of illness
Xu et al. Automatic Translation of Clinical Trial Eligibility Criteria into Formal Queries.
Gonzalez et al. Publication output in telemedicine in Spain
CN116108069A (en) Medical term normalization method and system based on knowledge reasoning
Segev et al. Internet as a knowledge base for medical diagnostic assistance
Pittman et al. The use of zip coded population data in social area studies of service utilization

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination