CN111428477A - Diagnostic name standardization method, device, electronic equipment and storage medium - Google Patents

Diagnostic name standardization method, device, electronic equipment and storage medium Download PDF

Info

Publication number
CN111428477A
CN111428477A CN202010151747.2A CN202010151747A CN111428477A CN 111428477 A CN111428477 A CN 111428477A CN 202010151747 A CN202010151747 A CN 202010151747A CN 111428477 A CN111428477 A CN 111428477A
Authority
CN
China
Prior art keywords
diagnosis
names
synonym
diagnostic
name
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010151747.2A
Other languages
Chinese (zh)
Other versions
CN111428477B (en
Inventor
汪雪松
干萌
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Anhui Iflytek Medical Information Technology Co ltd
Original Assignee
Anhui Iflytek Medical Information Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Anhui Iflytek Medical Information Technology Co ltd filed Critical Anhui Iflytek Medical Information Technology Co ltd
Priority to CN202010151747.2A priority Critical patent/CN111428477B/en
Publication of CN111428477A publication Critical patent/CN111428477A/en
Application granted granted Critical
Publication of CN111428477B publication Critical patent/CN111428477B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/70ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for mining of medical data, e.g. analysing previous cases of other patients
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02ATECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
    • Y02A90/00Technologies having an indirect contribution to adaptation to climate change
    • Y02A90/10Information and communication technologies [ICT] supporting adaptation to climate change, e.g. for weather forecasting or climate simulation

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Public Health (AREA)
  • Health & Medical Sciences (AREA)
  • Theoretical Computer Science (AREA)
  • Medical Informatics (AREA)
  • Primary Health Care (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Epidemiology (AREA)
  • General Health & Medical Sciences (AREA)
  • Databases & Information Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Pathology (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Biomedical Technology (AREA)
  • Medical Treatment And Welfare Office Work (AREA)

Abstract

The embodiment of the invention provides a method and a device for standardizing a diagnosis name, electronic equipment and a storage medium, wherein the method comprises the following steps: determining a plurality of diagnostic names; based on the similarity and medical relationship between every two diagnostic names in the plurality of diagnostic names, adjusting the candidate synonym relationship between every two diagnostic names to obtain the final synonym relationship between every two diagnostic names; and determining the standardized diagnosis name corresponding to each diagnosis name based on the final synonym relation between every two diagnosis names. The method, the device, the electronic equipment and the storage medium provided by the embodiment of the invention fuse the medical knowledge into the determination process of the synonym relation, and restrict each other with the similarity, thereby improving the accuracy and the reliability of the standardization of the diagnosis name.

Description

Diagnostic name standardization method, device, electronic equipment and storage medium
Technical Field
The invention relates to the technical field of natural language processing, in particular to a method and a device for standardizing a diagnosis name, electronic equipment and a storage medium.
Background
In the medical field, the diagnosis names in the medical records written by doctors are often not standard enough, and different diagnosis names can represent the same disease, which causes unnecessary fuzziness to the medical record content. In order to facilitate the inquiry, management and use of the medical record data in the later period, the standardized diagnosis names in the medical record are required.
Currently, the standardization of the diagnosis names mainly depends on manual labeling, or by comparing the similarity of two diagnosis names with a preset similarity threshold, it is determined whether the two diagnosis names represent the same disease. The former needs to consume a large amount of manpower, has low efficiency, and has strong subjectivity and low accuracy of the labeling result; the latter completely depends on the judgment of the similarity threshold, but the current technology cannot ensure the accuracy of setting the similarity threshold, so that the accuracy and reliability of standardization performed by the method cannot be ensured.
Disclosure of Invention
The embodiment of the invention provides a method and a device for standardizing a diagnosis name, electronic equipment and a storage medium, which are used for solving the problems of low accuracy and reliability of the standardization of the existing diagnosis name.
In a first aspect, an embodiment of the present invention provides a method for standardizing a diagnosis name, including:
determining a plurality of diagnostic names;
based on the similarity and medical relationship between every two diagnostic names in the plurality of diagnostic names, adjusting the candidate synonym relationship between every two diagnostic names to obtain the final synonym relationship between every two diagnostic names;
and determining the standardized diagnosis name corresponding to each diagnosis name based on the final synonym relation between every two diagnosis names.
Preferably, the medical relationship between any two diagnosis names includes at least one of a time-series relationship, an upper-lower relationship, and a difference in distribution of onset time between the any two diagnosis names.
Preferably, the adjusting the candidate synonym relationship between every two of the plurality of diagnosis names based on the similarity and the medical relationship between every two of the plurality of diagnosis names to obtain the final synonym relationship between every two of the plurality of diagnosis names specifically includes:
adjusting the candidate synonym relation between every two diagnostic names in the plurality of diagnostic names until the corresponding obtained global function value obtains the maximum value;
taking the candidate synonym relation between every two diagnosis names corresponding to the maximum value as the final synonym relation between every two diagnosis names;
wherein the global function value is determined based on a reference function value and a medical penalty function value for each two diagnostic names; the reference function value of any two diagnosis names is determined based on the candidate synonym relationship and the similarity between the any two diagnosis names, and the medical penalty function value of any two diagnosis names is determined based on the candidate synonym relationship and the medical relationship between the any two diagnosis names.
Preferably, the medical penalty function value comprises at least one of a disease timing penalty function value, a superior-inferior relationship penalty function value, and a time distribution difference penalty function value;
the disease time sequence penalty function value of any two diagnosis names is determined based on the time sequence relation and the candidate synonym relation between the any two diagnosis names;
the penalty function value of the upper and lower relations of any two diagnosis names is determined based on the upper and lower relations between any two diagnosis names and the candidate synonym relation;
the time distribution difference penalty function value of any two diagnosis names is determined based on the time distribution difference of onset between the any two diagnosis names and the candidate synonym relationship.
Preferably, the global function value is determined based on at least one of a reference function value and a medical penalty function value for each two diagnostic names, and a synonym transfer penalty function value, a similarity penalty function value and a similarity threshold penalty function value for each two diagnostic names;
the synonym transfer penalty function value of any two diagnosis names is determined based on the candidate synonym relation between any two diagnosis names and the rest diagnosis names;
the similarity penalty function value of any two diagnosis names is determined based on the candidate synonym relation and the similarity between any two diagnosis names;
the similarity threshold penalty function value of any two diagnosis names is determined based on the candidate synonym relationship and the similarity between any two diagnosis names and a preset similarity threshold.
Preferably, when the candidate synonym relationship between any two diagnostic names is negative and there are a plurality of transfer diagnostic names having the same candidate synonym relationship with the two diagnostic names, the synonym transfer penalty function value of any two diagnostic names is the maximum value of the penalty corresponding to each transfer diagnostic name;
the penalty score corresponding to any transfer diagnosis name is determined based on the similarity between any transfer diagnosis name and any two diagnosis names.
Preferably, the determining a standardized diagnosis name corresponding to each diagnosis name based on the final synonym relationship between every two diagnosis names specifically includes:
determining a plurality of synonym diagnostic name sets based on the final synonym relationship between every two diagnostic names;
if a plurality of bridge diagnosis names exist in any synonym diagnosis name set, taking the bridge diagnosis name with the highest similarity score as the standardized diagnosis name of any synonym diagnosis name set;
otherwise, taking the diagnosis name with the highest similarity score in any synonym diagnosis name set as the standardized diagnosis name of any synonym diagnosis name set;
wherein the similarity score for any diagnostic name is determined based on the similarity between said any diagnostic name and the diagnostic name for which each final synonym relationship is yes.
Preferably, the similarity between any two diagnosis names is determined based on diagnosis attributes corresponding to the any two diagnosis names, and the diagnosis attribute corresponding to any diagnosis name is extracted from medical record data corresponding to the any diagnosis name.
In a second aspect, an embodiment of the present invention provides a diagnostic name normalization apparatus, including:
a diagnosis name determination unit for determining a plurality of diagnosis names;
the synonym relation determining unit is used for adjusting the candidate synonym relation between every two diagnosis names based on the similarity and the medical relation between every two diagnosis names in the plurality of diagnosis names to obtain the final synonym relation between every two diagnosis names;
and the normalization unit is used for determining the normalized diagnosis name corresponding to each diagnosis name based on the final synonym relation between every two diagnosis names.
In a third aspect, an embodiment of the present invention provides an electronic device, including a processor, a communication interface, a memory, and a bus, where the processor and the communication interface, the memory complete mutual communication through the bus, and the processor may call a logic command in the memory to perform the steps of the method provided in the first aspect.
In a fourth aspect, an embodiment of the present invention provides a non-transitory computer readable storage medium, on which a computer program is stored, which when executed by a processor, implements the steps of the method as provided in the first aspect.
According to the method and device for standardizing the diagnosis names, the electronic equipment and the storage medium, provided by the embodiment of the invention, the synonym relation between the diagnosis names is adjusted based on the similarity and the medical relation between every two diagnosis names, so that the standardization of the diagnosis names is realized, the medical knowledge is fused into the determination process of the synonym relation, and the synonym relation and the similarity are mutually restricted, so that the accuracy and the reliability of the standardization of the diagnosis names are improved.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and those skilled in the art can also obtain other drawings according to the drawings without creative efforts.
FIG. 1 is a flow chart of a method for standardizing a diagnostic name according to an embodiment of the present invention;
FIG. 2 is a flowchart illustrating a final synonym relationship determination method according to an embodiment of the present disclosure;
FIG. 3 is a schematic flow chart of a standardized diagnostic name determination method according to an embodiment of the present invention;
FIG. 4 is a schematic structural diagram of a synonym diagnostic name set having bridge diagnostic names according to an embodiment of the present disclosure;
FIG. 5 is a schematic structural diagram of a synonym diagnosis name set without a bridge diagnosis name according to an embodiment of the present invention;
fig. 6 is a schematic structural diagram of a diagnostic name normalization apparatus provided in an embodiment of the present invention;
fig. 7 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
In the medical field, because of the huge data volume of medical record data and the lack of strict data monitoring in the early stage, information in medical records is very redundant and messy. In the case of "cold", the diagnosis names in practice may be "cold", "acute upper respiratory infection", "upper infection", "acute upper infection", etc., and different diagnosis names may actually indicate the same disease. In order to facilitate the inquiry, management and use of the medical record data in the later period, the standardized diagnosis names in the medical record are required.
Currently, the standardization of the diagnosis names mainly depends on manual labeling, or by comparing the similarity of two diagnosis names with a preset similarity threshold, it is determined whether the two diagnosis names represent the same disease. The method has the advantages that a large amount of manpower is consumed, the efficiency is low, the subjectivity of the labeling result is strong, and the accuracy is low; the latter completely depends on the judgment of the similarity threshold, but the current technology cannot ensure the accuracy of setting the similarity threshold, and the relation between diagnostic names determined by only the similarity threshold is not scientific, so that the accuracy and reliability of standardization cannot be ensured. Therefore, the embodiment of the invention provides a method for standardizing the diagnosis name so as to improve the accuracy and reliability of the standardization of the diagnosis name.
Fig. 1 is a schematic flow chart of a method for standardizing a diagnostic name according to an embodiment of the present invention, as shown in fig. 1, the method includes:
at step 110, a plurality of diagnostic names are determined.
The diagnosis name here is a diagnosis name that needs to be standardized, and the diagnosis name may be extracted from medical record data.
And step 120, adjusting the candidate synonym relationship between every two of the plurality of diagnosis names based on the similarity and the medical relationship between every two of the plurality of diagnosis names to obtain the final synonym relationship between every two of the plurality of diagnosis names.
Specifically, for any two diagnostic names, the synonym relationship between the two diagnostic names may be "yes" or "no", where "yes" indicates that the two diagnostic names are synonyms, and "no" indicates that the two diagnostic names are not synonyms. The candidate synonym relationship represents a synonym relationship before and during adjustment of the synonym relationship between every two diagnostic names based on the similarity between every two diagnostic names and the medical relationship, and the final synonym relationship represents a synonym relationship after adjustment is completed. The candidate synonym relationship between each two diagnostic names may be randomly generated before the candidate synonym relationship between each two diagnostic names is adjusted based on the similarity and medical relationship between each two diagnostic names.
The similarity between any two diagnosis names is used to represent the similarity of information corresponding to the two diagnosis names, where the information may specifically be medical record data corresponding to the two diagnosis names, or may also be specific attributes of the medical record data corresponding to the two diagnosis names, such as chief complaints, current medical history, past medical history, allergy history, physical examination, auxiliary examination, and office of medical treatment, and this is not specifically limited in the embodiments of the present invention.
The medical relationship between any two diagnosis names is used for representing the relationship embodied by the two diagnosis names in the medical field, and the medical relationship can be the upper and lower relationship of diseases corresponding to the two diagnosis names in a disease system in the medical field, the sequence of the diseases corresponding to the two diagnosis names embodied on a patient, or the difference and the same of the common attack time of the diseases corresponding to the two diagnosis names.
The similarity and medical relationship between any two diagnostic names is associated with the synonym relationship between the two diagnostic names. For example, the higher the similarity is, the higher the probability that the synonym relationship between the two diagnosis names is "yes", and if the diseases corresponding to the two diagnosis names in the medical relationship are presented on the patient in chronological order or concurrently, or if the diseases corresponding to the two diagnosis names have a top-bottom relationship, the probability that the synonym relationship between the two diagnosis names is "yes" is reduced.
Here, the adjustment of the candidate synonym relationship is performed by the mutual restriction of the similarity between every two diagnosis names and the medical relationship, thereby avoiding the disadvantage caused by judging the synonym relationship only by the similarity.
For example, for any two diagnostic names "acute upper respiratory infection" and "viral pharyngitis," the candidate synonym relationship between the two is "yes" and the similarity between the two is 70%, and "acute upper respiratory infection" is a generic concept of "viral pharyngitis" in the medical relationship between the two, so although the similarity between the two is high, the candidate synonym relationship between the two is still adjusted to obtain the final synonym relationship "no" due to the limitation of the upper-lower relationship.
And step 130, determining a standardized diagnosis name corresponding to each diagnosis name based on the final synonym relation between every two diagnosis names.
Specifically, after the final synonym relationship between every two diagnostic names is determined, a plurality of groups of diagnostic names which are synonyms of each other can be obtained, for example, "myelopathy," "spinal cord dysfunction," "myelopathy," and "neurogenic myelopathy" are a group of synonyms, one diagnostic name can be selected from the groups of synonymous diagnostic names as a standardized diagnostic name of the group of synonyms, and the standardized diagnostic name corresponding to each diagnostic name can be obtained based on the same method, so that the standardization of the diagnostic names is realized.
According to the method provided by the embodiment of the invention, the synonym relation between the diagnosis names is adjusted based on the similarity and the medical relation between every two diagnosis names, so that the standardization of the diagnosis names is realized, the medical knowledge is integrated into the determination process of the synonym relation, and the synonym relation and the similarity are mutually restricted, so that the accuracy and the reliability of the standardization of the diagnosis names are improved.
Based on the above embodiment, the medical relationship between any two diagnosis names includes at least one of a time-series relationship, an upper-lower relationship, and a difference in distribution of the onset time between the two diagnosis names.
Specifically, for any two diagnosis names, the time sequence relationship of the two diagnosis names, that is, the sequence of the diseases corresponding to the two diagnosis names on the same patient. The time sequence relation can be extracted from a large amount of medical record data, and in the medical records of the same patient, if another diagnosis name is necessarily present in the previous period before any diagnosis name is present, the two diagnosis names are most likely not synonyms. Here, the length of time for determining whether a timing relationship exists between two diagnostic names may be determined based on business experience.
The upper and lower relationship of any two diagnosis names, that is, the upper and lower relationship of the disease corresponding to two diagnosis names in the disease system in the medical field, if any diagnosis name corresponding symptom word set contains another diagnosis name corresponding symptom word set, the two diagnosis names may have the upper and lower relationship, and most likely are not synonyms.
The distribution difference of the onset time of any two diagnosis names, namely the difference of the onset time distribution of the diseases corresponding to the two diagnosis names, can be divided according to the month, and can also be divided according to the quarter or other time units. Two diagnostic names herein are most likely not synonyms if there is a significant difference in the distribution of the disease for the two diagnostic names over a preset time unit.
Based on any of the above embodiments, fig. 2 is a schematic flow chart of a final synonym relationship determining method provided by the embodiment of the present invention, as shown in fig. 2, step 120 specifically includes:
and step 121, adjusting the candidate synonym relation between every two diagnostic names in the plurality of diagnostic names until the corresponding obtained global function value obtains the maximum value.
And step 122, taking the candidate synonym relation between every two diagnosis names corresponding to the maximum value as the final synonym relation between every two diagnosis names.
Wherein the global function value is determined based on the reference function value and the medical penalty function value of every two diagnosis names; the reference function value of any two diagnosis names is determined based on the candidate synonym relationship and the similarity between the two diagnosis names, and the medical penalty function value of any two diagnosis names is determined based on the candidate synonym relationship and the medical relationship between the two diagnosis names.
Specifically, the purpose of adjusting the candidate synonym relationship is to maximize the global function value, and the final synonym relationship between every two diagnostic names is the candidate synonym relationship between every two diagnostic names corresponding to the maximum global function value.
For any two diagnostic names, a reference function value for both may be derived based on the candidate synonym relationship and similarity between the two. Since the higher the similarity is, the higher the probability that the candidate synonym relationship between the two diagnostic names is "yes", and the lower the similarity is, the higher the probability that the candidate synonym relationship between the two diagnostic names is "no", the reference function value determined based on the above rule may be regarded as the score of the rule between the candidate synonym relationship and the similarity, and when the candidate synonym relationship is "yes", the higher the similarity is, the higher the reference function value is.
For any two diagnostic names, the medical penalty function values for both may be derived based on the candidate synonym relationship and the medical relationship between the two. Here, the medical penalty function value may be regarded as a penalty score based on a restriction of the medical relationship to the candidate synonym relationship. When a medical relation limiting the candidate synonym relation of the two is 'no' exists between the two, the medical penalty function value can be correspondingly increased. For example, if the diseases corresponding to the two diagnosis names in the medical relationship are presented in the patient in a sequential or concurrent manner, the probability that the candidate synonym relationship between the two diagnosis names is "yes" is reduced, and if the candidate synonym relationship is "yes", the medical penalty function value is increased.
Based on the reference function value and the medical penalty function value for every two diagnostic names, a global function value can be determined. Wherein, the higher the reference function value of every two diagnosis names is, the higher the global function value is; the higher the medical penalty function value for every two diagnostic names, the lower the global function value. Under the condition that the reference function value and the medical punishment value are restricted with each other, the candidate synonym relation between every two diagnosis names is adjusted to realize the maximization of the global function value, and the candidate synonym relation between every two diagnosis names corresponding to the maximum global function value is used as the final synonym relation between every two diagnosis names.
According to the method provided by the embodiment of the invention, the medical penalty function value is determined based on the candidate synonym relation and the medical relation among the diagnosis names, so that the medical penalty function value is applied to the determination of the global function value, and the value of the candidate synonym relation is restricted through the medical relation, so that the accurate determination of the synonym relation among the diagnosis names is realized.
Based on any of the above embodiments, the medical penalty function value includes at least one of a disease timing penalty function value, a superior-inferior relationship penalty function value, and a time distribution difference penalty function value; the disease time sequence penalty function value of any two diagnosis names is determined based on the time sequence relation between the two diagnosis names and the candidate synonym relation; the penalty function value of the upper and lower relations of any two diagnosis names is determined based on the upper and lower relations between the two diagnosis names and the candidate synonym relation; the time distribution difference penalty function value of any two diagnosis names is determined based on the time distribution difference of onset between the two diagnosis names and the candidate synonym relationship.
Specifically, if the timing relationship between any two diagnosis names is "present" and the candidate synonym relationship is "yes", it is obvious that the timing relationship is contrary to the candidate synonym relationship, and the disease timing penalty function value is set to a preset value correspondingly, so that the disease timing penalty comes into effect to constrain the global function value; in other cases, for example, when the timing relationship is "present" and the candidate synonym relationship is "no", or when the timing relationship is "absent", the disease timing penalty function value is set to zero, and the disease timing penalty does not take effect.
If the superior-inferior relation between any two diagnosis names is 'existing' and the candidate synonym relation is 'yes', obviously, the superior-inferior relation and the candidate synonym relation are contrary, and the superior-inferior relation penalty function value is correspondingly set as a preset value, so that the superior-inferior relation penalty comes into effect to restrict the global function value; in other cases, for example, when the superior-inferior relation is "present" and the candidate synonym relation is "no", or when the superior-inferior relation is "absent", the superior-inferior relation penalty function value is set to zero, and the superior-inferior relation penalty is not valid.
If the difference of the disease attack time distribution between any two diagnosis names is 'present' and the candidate synonym relationship is 'yes', obviously, the difference of the disease attack time distribution is contrary to the candidate synonym relationship, and the time distribution difference penalty function value is correspondingly set to be a preset value, so that the time distribution difference penalty comes into effect to restrict the global function value; in other cases, for example, when the difference in the occurrence time distribution is "present" and the candidate synonym relationship is "no", or when the difference in the occurrence time distribution is "absent", the time distribution difference penalty function value is set to zero, and the time distribution difference penalty does not take effect.
Based on any of the above embodiments, for any two diagnostic names p and q, the candidate synonym relationship between p and q is denoted as Exist (p, q), where Exist (p, q) ═ 1 corresponds to the candidate synonym relationship being "yes" and Exist (p, q) ═ 0 corresponds to the candidate synonym relationship being "no". The similarity between p and q can be expressed as Prob (p, q), and the resulting p and q reference function values S can be expressed as:
S=Prob(p,q)*Exist(p,q)
based on any of the above embodiments, the timing relationship between p and q is represented as T (p, q):
Figure BDA0002402690470000101
from this, the disease timing penalty function values S1 for p and q are expressed as:
S1=T(p,q)*(T(p,q)∧Exist(p,q))
wherein T (p, q) ^ exists (p, q) represents the intersection of the timing relationship and the candidate synonym relationship, and if and only if T (p, q) and Exist (p, q) are both 1, then S1 is 1, the disease timing penalty is valid, otherwise, S1 is 0, the disease timing penalty is invalid.
Based on any of the above embodiments, the superior-inferior relationship between p and q is expressed as
Figure BDA0002402690470000102
Figure BDA0002402690470000103
Figure BDA0002402690470000104
Wherein set (P) and set (Q) are the symptom word sets of p and q, respectively.
From this, the upper and lower penalty function values S2 for p and q are expressed as:
Figure BDA0002402690470000105
in the formula (I), the compound is shown in the specification,
Figure BDA0002402690470000106
representing the intersection of the superior-inferior relationship and the candidate synonym relationship, if and only if
Figure BDA0002402690470000107
And Exist (p, q) are both 1, the upper and lower relationship penalty is valid when S2 is 1, otherwise, the upper and lower relationship penalty is invalid when S2 is 0.
Based on any of the above examples, the difference in the temporal distribution of onset between p and q is expressed as R (p, q):
Figure BDA0002402690470000111
from this, the time distribution difference penalty function values S3 for p and q are expressed as:
S3=R(p,q)*(R(p,q)∧Exist(p,q))
in the formula, R (p, q) ^ Exist (p, q) represents the intersection of the incidence time distribution difference and the candidate synonym relationship, and if and only if R (p, q) and Exist (p, q) are both 1, S3 is 1, the time distribution difference penalty is valid, otherwise, S3 is 0, the time distribution difference penalty is invalid.
Based on any of the above embodiments, the value of R (p, q) can be compared with the standard deviation diff (p, q) of the difference between the time distributions of onset of two diagnostic names and the preset threshold value of standard deviation, which is specifically represented as:
Figure BDA0002402690470000112
wherein diff (p, q) ═ std (monthrate (p) -monthrate (q)), monthrate (p) and monthrate (q) respectively indicate the incidence of the diagnosis names p and q in each natural month in a period of years, and monthrate (p) and monthrate (q) are 12-dimensional vectors.
The standard deviation threshold can be obtained by the following formula:
Figure BDA0002402690470000113
wherein D is a set including each of the diagnosis names, and N is the number of the diagnosis names included in D.
For example, diagnosis name p is acute upper respiratory infection, diagnosis name q is hypertension:
MonthRate(p)
=c(0.08,0.09,0.28,0.21,0.04,0.04,0.02,0.01,0.04,0.04,0.07,0.08)
MonthRate(q)
=c(0.09,0.08,0.06,0.07,0.09,0.09,0.08,0.09,0.10,0.08,0.08,0.09)
this calculation yields diff (p, q) ═ std (monthrate (p) -monthrate (p)) > 0.09.
In any of the above embodiments, the global function value is specifically determined based on a reference function value and a medical penalty function value for each two diagnostic names, and at least one of a transfer penalty function value, a similarity penalty function value, and a similarity threshold penalty function value for each synonym of the two diagnostic names.
The synonym transfer penalty function value of any two diagnosis names is determined based on the synonym relation between any two diagnosis names and the rest diagnosis names; the similarity penalty function value of any two diagnosis names is determined based on the synonym relation and the similarity between any two diagnosis names; the similarity threshold penalty function value of any two diagnosis names is determined based on the synonym relation and similarity between any two diagnosis names and a preset similarity threshold.
Specifically, synonyms themselves are transitive, and given that a and B are synonyms of each other and a and C are synonyms of each other, B and C are most likely synonyms of each other. Based on the rule, aiming at any two diagnosis names, if the candidate synonym relationship between the two diagnosis names is 'no', the candidate synonym relationship between the two diagnosis names and the rest diagnosis names is obtained, so that whether the diagnosis names with the candidate synonym relationship of the two diagnosis names being 'yes' exist or not is judged, if yes, the candidate synonym relationship of the two diagnosis names is obviously contrary to the transmissibility of synonyms, and a synonym transmission penalty function value is determined, so that the synonym transmission penalty takes effect to restrict a global function value; if not, the synonym transfer penalty function value is set to zero, and the synonym transfer penalty is not effective.
The similarity and the candidate synonym relationship have a corresponding relationship, the higher the similarity is, the higher the probability that the candidate synonym relationship between the two diagnosis names is 'yes' is, and the lower the similarity is, the higher the probability that the candidate synonym relationship between the two diagnosis names is 'no' is, and based on the rule, aiming at any two diagnosis names, the global function value is constrained through the similarity punishment. Here, when the candidate synonym relationship is "yes", the higher the similarity is, the smaller the similarity penalty function value is, and the lower the similarity is, the larger the similarity penalty function value is; and when the candidate synonym relationship is 'no', the higher the similarity is, the larger the similarity penalty function value is, and the lower the similarity is, the smaller the similarity penalty function value is.
The corresponding candidate synonym relationship may generally be determined by comparing the similarity with a preset similarity threshold. For any two diagnosis names, when the similarity between the two diagnosis names is greater than a preset similarity threshold, the probability that the candidate synonym relation is 'yes' is higher, and the probability that the candidate synonym relation is 'no' is lower; when the similarity is smaller than a preset similarity threshold, the probability that the candidate synonym relation is 'yes' is small, and the probability that the candidate synonym relation is 'no' is large. Based on the rule, if the candidate synonym relationship is 'no' under the condition that the similarity is greater than the preset similarity threshold, or if the candidate synonym relationship is 'yes' under the condition that the similarity is less than the preset similarity threshold, it is obvious that the candidate synonym is contrary to the comparison rule based on the preset similarity threshold, and a similarity threshold penalty function value is correspondingly set, so that the similarity threshold penalty comes into effect to constrain a global function value; if not, setting the similarity threshold penalty function value to zero, and the similarity threshold penalty is not effective.
The method provided by the embodiment of the invention can apply the medical penalty function value to the determination of the global function value, and can also apply at least one of the synonym transfer penalty function value, the similarity penalty function value and the similarity threshold penalty function value to the determination of the global function value, so that while the value of the candidate synonym relationship is restricted through the medical relationship, the value of the candidate synonym relationship is restricted by applying the similarity judgment and the rule of the synonym relationship, and the accuracy of the final synonym relationship determination is further improved.
Based on any of the embodiments, when the candidate synonym relationship between any two diagnostic names is negative and there are a plurality of transfer diagnostic names having the same candidate synonym relationship with the two diagnostic names, the synonym transfer penalty function value of the two diagnostic names is the maximum value of the penalty corresponding to each transfer diagnostic name; the penalty score corresponding to any transfer diagnosis name is determined based on the similarity between the transfer diagnosis name and the two diagnosis names.
Specifically, for any two diagnostic names p and q, either one conveys a diagnostic name riThe ith transitive diagnostic name in the plurality of transitive diagnostic names p and q specifically refers to the diagnostic name with the candidate synonym relationship of the two diagnostic names being "yes", namely, Exist (p, r)i)=1,Exist(q,ri)=1。
If the candidate synonym relationship between p and q exists (p, q) ═ 0, for any delivery diagnosis name riCandidate diagnosis name riThe corresponding penalty score is based on p, q, p, riQ, riWherein p, r are determinediThe similarity between Prob (p, r)i) And q, riThe similarity between Prob (q, r)i) The higher, riThe higher the corresponding penalty score.
Based on any of the above embodiments, for riThe corresponding penalty score is expressed as:
(Prob(p,ri)+Prob(ri,q)-Prob(p,q))*(1-Exist(p,q))
whereinIf the candidate synonym relationship between p and q is "yes", the penalty score is correspondingly 0, and if the candidate synonym relationship between p and q is "no", the penalty score is correspondingly represented as Prob (p, r)i)Prob(riQ) -Prob (p, q). The maximum value is taken for the penalty scores of all the delivered diagnosis names, and the synonym delivery penalty function value S4 can be obtained as follows:
S4=|max((Prob(p,ri)+Prob(ri,q)-Prob(p,q))*(1-Exist(p,q)))|
based on any of the above embodiments, the similarity penalty function value S5 of p and q can be expressed as an absolute difference between the candidate synonym relationship Exist (p, q) of p and q and the similarity Prob (p, q) of p and q, specifically:
S5=|Exist(p,q)-Prob(p,q)|
based on any of the above embodiments, assume that the preset similarity threshold is THGlobalSetting W (p, q) to represent the similarity Prob (p, q) of p and q and a preset similarity threshold value as THGlobalThe size of (a) is specifically:
Figure BDA0002402690470000141
the similarity threshold penalty function values S6 for p and q thus obtained are:
Figure BDA0002402690470000142
in the formula (I), the compound is shown in the specification,
Figure BDA0002402690470000143
for nand notation, when W (p, q) is 1 and Exist (p, q) is 0, or W (p, q) is 0 and Exist (p, q) is 1, i.e., when the similarity is greater than the preset similarity threshold and the candidate synonym relationship is "no", or when the similarity is less than the preset similarity threshold and the candidate synonym relationship is "yes", S6 is 1, the similarity threshold penalty takes effect.
Based on any of the above embodiments, in the method, a global function value may be obtained based on the reference function value S of every two diagnosis names, and the disease timing penalty function value S1, the upper and lower relationship penalty function values S2, the time distribution difference penalty function value S3, the synonym transfer penalty function value S4, the similarity penalty function value S5, and the similarity threshold penalty function value S6, and an objective function for maximizing the global function value may be specifically expressed as the following formula:
maxmize(S-α*S1-β*S2-γ*S3-*S4-*S5-θ*S6)
in the formula, α, β, γ, and θ are preset weights corresponding to S1, S2, S3, S4, S5, and S6, respectively.
Based on any of the above embodiments, fig. 3 is a schematic flowchart of a method for determining a standardized diagnosis name according to an embodiment of the present invention, as shown in fig. 3, step 130 specifically includes:
step 131, determining a plurality of synonym diagnosis name sets based on the final synonym relationship between every two diagnosis names.
Specifically, after the final synonym relationship between every two diagnostic names is obtained, the synonym of each diagnostic name can be determined, so that a plurality of synonym diagnostic name sets are obtained. Here, any synonym diagnosis name set includes a plurality of diagnosis names, and any diagnosis name in the set is a synonym with at least one diagnosis name in the set.
Step 132, if a plurality of bridge diagnosis names exist in any synonym diagnosis name set, taking the bridge diagnosis name with the highest similarity score as the standardized diagnosis name of the synonym diagnosis name set; otherwise, taking the diagnosis name with the highest similarity score in the synonym diagnosis name set as the standardized diagnosis name of the synonym diagnosis name set; wherein the similarity score for any one diagnostic name is determined based on the similarity between that diagnostic name and the diagnostic name for which each final synonym relationship is yes.
Specifically, in any synonym diagnosis name set, if deletion of one of the diagnosis names causes the existence of a diagnosis name in the set which is not a synonym with each of the rest diagnosis names, the deleted diagnosis name is used as a bridge diagnosis name. The bridge diagnosis names play a role in communicating all diagnosis names in the set in the synonym diagnosis name set.
In general, there are two cases in the synonym diagnosis name set, one is that there are a plurality of bridge diagnosis names in the synonym diagnosis name set, and the other is that there are no bridge diagnosis names in the synonym diagnosis name set.
Fig. 4 is a schematic structural diagram of a synonym diagnostic name set having a bridge diagnostic name according to an embodiment of the present invention, fig. 5 is a schematic structural diagram of a synonym diagnostic name set having no bridge diagnostic name according to an embodiment of the present invention, in fig. 4 and 5, each node corresponds to one diagnostic name in the synonym diagnostic name set, and a connection line between two nodes indicates that a final synonym relationship between two diagnostic names is "yes", that is, two nodes are synonyms, and a value on the connection line between the nodes is a similarity between two diagnostic names. In fig. 4, A, B are all bridge diagnosis names, and in fig. 5, no bridge diagnosis name exists.
When the bridge diagnosis names exist, calculating the similarity score of each bridge diagnosis name, and taking the bridge diagnosis name with the highest similarity score as a standardized diagnosis name; when there is no bridge diagnosis name, a similarity score is calculated for each diagnosis name, and the diagnosis name having the highest similarity score is set as the standardized diagnosis name. The similarity score is determined based on the similarity between the diagnosis name and each diagnosis name having the final synonym relationship of "yes", that is, based on the value on the connecting line of the nodes corresponding to the diagnosis name.
Further, the similarity score may be a combination of the similarity between the diagnosis name and each diagnosis name having the final synonym relationship "yes", for example, in fig. 4, the similarity score of the node a is 7.3, and the similarity score of the node B is 6.67, so the diagnosis name corresponding to the node a is taken as the standardized diagnosis name. In fig. 5, the node with the highest similarity score is a, and the diagnosis name corresponding to the node a is defined as the standardized diagnosis name.
Based on any of the above embodiments, the similarity between any two diagnosis names is determined based on the diagnosis attributes corresponding to the two diagnosis names, and the diagnosis attribute corresponding to any diagnosis name is extracted from the medical record data corresponding to the diagnosis name.
Specifically, a large amount of medical record data corresponding to any diagnosis name may be collected in advance, and a plurality of diagnosis attributes corresponding to the diagnosis name may be extracted from the medical record data. Here, the diagnostic attribute may include at least one of a chief complaint, a current medical history, an allergy history, a past medical history, an auxiliary examination, and a visiting department.
The similarity Prob (p, q) between the diagnosis names p and q can be obtained by weighting the similarity of each diagnosis attribute corresponding to the diagnosis names p and q, for example, the relevant information corresponding to the diagnosis names p and q includes Main complaint Main, Current medical history, Allergy history, past medical history, Auxiliary examination Auxiliary, and visiting department Dep, and the similarity Prob _ Main (p, q) of the Main complaint Main is calculated as follows by taking the Main complaint Main as an example:
Figure BDA0002402690470000161
in the formula, p _ Main and q _ Main represent the Main complaints of p and q, respectively, p-Main ∩ q _ Main and p _ Main ∪ q _ Main are the intersection and union of the two, respectively, and cart represents the number of elements in the set.
Based on similar formulas, the similarity of the Current medical history Prob _ Current (p, q), the similarity of the Allergy history Prob _ Allergy (p, q), the similarity of the past medical history Prob _ Previous (p, q), the similarity of the Auxiliary examination Prob _ Auxiliary (p, q), and the similarity of the visiting department Prob _ Dep (p, q) can be obtained respectively.
Then, the similarity Prob (p, q) between p and q can be obtained by weighting based on the similarity of each piece of relevant information.
Based on any one of the embodiments, the method for standardizing the diagnosis name comprises the following steps:
first, a large amount of medical record data is acquired, and a plurality of diagnosis names to be standardized and diagnosis attributes corresponding to each diagnosis name are extracted from the large amount of medical record data. Based on the diagnostic attributes of every two diagnostic names, the similarity between every two diagnostic names is calculated. In addition, the time sequence relation, the upper and lower position relation and the difference of the distribution of the disease onset time between every two diagnosis names are determined by combining medical knowledge.
Secondly, calculating a global function value based on the similarity, the time sequence relation, the upper and lower relation and the distribution difference of the attack time between every two diagnosis names, and adjusting the candidate synonym relation between every two diagnosis names by taking the maximized global function value as a target until the maximum value of the global function value is obtained.
And taking the candidate synonym relation between every two diagnosis names corresponding to the maximum value as the final synonym relation between every two diagnosis names.
After the final synonym relationship between every two diagnostic names is determined, a plurality of sets of synonym diagnostic names can be determined based on the final synonym relationship between every two diagnostic names. Aiming at any synonym diagnosis name set, if the bridge diagnosis name exists, calculating the similarity score of each bridge diagnosis name, and taking the bridge diagnosis name with the highest similarity score as a standardized diagnosis name; otherwise, a similarity score is calculated for each diagnosis name, and the diagnosis name with the highest similarity score is taken as the standardized diagnosis name.
Based on any of the above embodiments, fig. 6 is a schematic structural diagram of a diagnosis name normalizing device provided by an embodiment of the present invention, as shown in fig. 6, the diagnosis name normalizing device includes a diagnosis name determining unit 610, a synonym relation determining unit 620, and a normalizing unit 630;
wherein, the diagnosis name determining unit 610 is used for determining a plurality of diagnosis names;
the synonym relationship determining unit 620 is configured to adjust the candidate synonym relationship between every two diagnostic names based on the similarity and the medical relationship between every two diagnostic names in the plurality of diagnostic names, so as to obtain a final synonym relationship between every two diagnostic names;
the normalization unit 630 is used for determining a normalized diagnosis name corresponding to each diagnosis name based on the final synonym relationship between every two diagnosis names.
The device provided by the embodiment of the invention adjusts the synonym relation between the diagnosis names based on the similarity and the medical relation between every two diagnosis names, so that the standardization of the diagnosis names is realized, the medical knowledge is integrated into the determination process of the synonym relation, and the synonym relation and the similarity are mutually restricted, thereby improving the accuracy and the reliability of the standardization of the diagnosis names.
Based on any of the above embodiments, the medical relationship between any two diagnosis names includes at least one of a time-series relationship, an upper-lower relationship, and a difference in distribution of onset time between the any two diagnosis names.
Based on any of the embodiments above, the synonym relationship determining unit 620 is specifically configured to:
adjusting the candidate synonym relation between every two diagnostic names in the plurality of diagnostic names until the corresponding obtained global function value obtains the maximum value;
taking the candidate synonym relation between every two diagnosis names corresponding to the maximum value as the final synonym relation between every two diagnosis names;
wherein the global function value is determined based on a reference function value and a medical penalty function value for each two diagnostic names; the reference function value of any two diagnosis names is determined based on the candidate synonym relationship and the similarity between the any two diagnosis names, and the medical penalty function value of any two diagnosis names is determined based on the candidate synonym relationship and the medical relationship between the any two diagnosis names.
According to any of the above embodiments, the medical penalty function value includes at least one of a disease timing penalty function value, a superior-inferior relationship penalty function value, and a time distribution difference penalty function value;
the disease time sequence penalty function value of any two diagnosis names is determined based on the time sequence relation and the candidate synonym relation between the any two diagnosis names;
the penalty function value of the upper and lower relations of any two diagnosis names is determined based on the upper and lower relations between any two diagnosis names and the candidate synonym relation;
the time distribution difference penalty function value of any two diagnosis names is determined based on the time distribution difference of onset between the any two diagnosis names and the candidate synonym relationship.
According to any of the above embodiments, the global function value is specifically determined based on a reference function value and a medical penalty function value of every two diagnosis names, and at least one of a transfer penalty function value, a similarity penalty function value and a similarity threshold penalty function value of a synonym of every two diagnosis names;
the synonym transfer penalty function value of any two diagnosis names is determined based on the candidate synonym relation between any two diagnosis names and the rest diagnosis names;
the similarity penalty function value of any two diagnosis names is determined based on the candidate synonym relation and the similarity between any two diagnosis names;
the similarity threshold penalty function value of any two diagnosis names is determined based on the candidate synonym relationship and the similarity between any two diagnosis names and a preset similarity threshold.
Based on any of the above embodiments, when the candidate synonym relationship between any two diagnostic names is negative and there are a plurality of transfer diagnostic names whose candidate synonym relationship with the two diagnostic names is positive, the synonym transfer penalty function value of any two diagnostic names is the maximum value of the penalty corresponding to each transfer diagnostic name;
the penalty score corresponding to any transfer diagnosis name is determined based on the similarity between any transfer diagnosis name and any two diagnosis names.
Based on any of the above embodiments, the normalization unit 630 is specifically configured to:
determining a plurality of synonym diagnostic name sets based on the final synonym relationship between every two diagnostic names;
if a plurality of bridge diagnosis names exist in any synonym diagnosis name set, taking the bridge diagnosis name with the highest similarity score as the standardized diagnosis name of any synonym diagnosis name set;
otherwise, taking the diagnosis name with the highest similarity score in any synonym diagnosis name set as the standardized diagnosis name of any synonym diagnosis name set;
wherein the similarity score for any diagnostic name is determined based on the similarity between said any diagnostic name and the diagnostic name for which each final synonym relationship is yes.
Based on any of the above embodiments, the similarity between any two diagnosis names is determined based on the diagnosis attributes corresponding to the any two diagnosis names, and the diagnosis attribute corresponding to any diagnosis name is extracted from the medical record data corresponding to the any diagnosis name.
Fig. 7 is a schematic structural diagram of an electronic device according to an embodiment of the present invention, and as shown in fig. 7, the electronic device may include: a processor (processor)710, a communication Interface (Communications Interface)720, a memory (memory)730, and a communication bus 740, wherein the processor 710, the communication Interface 720, and the memory 730 communicate with each other via the communication bus 740. Processor 710 may call logical commands in memory 730 to perform the following method: determining a plurality of diagnostic names; based on the similarity and medical relationship between every two diagnostic names in the plurality of diagnostic names, adjusting the candidate synonym relationship between every two diagnostic names to obtain the final synonym relationship between every two diagnostic names; and determining the standardized diagnosis name corresponding to each diagnosis name based on the final synonym relation between every two diagnosis names.
In addition, the logic commands in the memory 730 can be implemented in the form of software functional units and stored in a computer readable storage medium when the logic commands are sold or used as independent products. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes a plurality of commands for enabling a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
Embodiments of the present invention further provide a non-transitory computer-readable storage medium, on which a computer program is stored, where the computer program is implemented to perform the method provided in the foregoing embodiments when executed by a processor, and the method includes: determining a plurality of diagnostic names; based on the similarity and medical relationship between every two diagnostic names in the plurality of diagnostic names, adjusting the candidate synonym relationship between every two diagnostic names to obtain the final synonym relationship between every two diagnostic names; and determining the standardized diagnosis name corresponding to each diagnosis name based on the final synonym relation between every two diagnosis names.
The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.
Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware. With this understanding in mind, the above technical solutions may be embodied in the form of a software product, which can be stored in a computer-readable storage medium, such as ROM/RAM, magnetic disk, optical disk, etc., and includes commands for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method according to the embodiments or some parts of the embodiments.
Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims (10)

1. A method for standardizing a diagnostic name, comprising:
determining a plurality of diagnostic names;
based on the similarity and medical relationship between every two diagnostic names in the plurality of diagnostic names, adjusting the candidate synonym relationship between every two diagnostic names to obtain the final synonym relationship between every two diagnostic names;
and determining the standardized diagnosis name corresponding to each diagnosis name based on the final synonym relation between every two diagnosis names.
2. The method of standardizing diagnosis names according to claim 1, wherein the medical relationship between any two diagnosis names includes at least one of a time series relationship, an upper and lower relationship, and a difference in distribution of onset time between the any two diagnosis names.
3. The method according to claim 1 or 2, wherein the adjusting the candidate synonym relationship between each two of the plurality of diagnosis names based on the similarity and medical relationship between each two of the plurality of diagnosis names to obtain the final synonym relationship between each two of the plurality of diagnosis names specifically comprises:
adjusting the candidate synonym relation between every two diagnostic names in the plurality of diagnostic names until the corresponding obtained global function value obtains the maximum value;
taking the candidate synonym relation between every two diagnosis names corresponding to the maximum value as the final synonym relation between every two diagnosis names;
wherein the global function value is determined based on a reference function value and a medical penalty function value for each two diagnostic names; the reference function value of any two diagnosis names is determined based on the candidate synonym relationship and the similarity between the any two diagnosis names, and the medical penalty function value of any two diagnosis names is determined based on the candidate synonym relationship and the medical relationship between the any two diagnosis names.
4. The method of standardizing a diagnosis name according to claim 3, wherein the medical penalty function value includes at least one of a disease timing penalty function value, an upper and lower relation penalty function value, and a time distribution difference penalty function value;
the disease time sequence penalty function value of any two diagnosis names is determined based on the time sequence relation and the candidate synonym relation between the any two diagnosis names;
the penalty function value of the upper and lower relations of any two diagnosis names is determined based on the upper and lower relations between any two diagnosis names and the candidate synonym relation;
the time distribution difference penalty function value of any two diagnosis names is determined based on the time distribution difference of onset between the any two diagnosis names and the candidate synonym relationship.
5. The method according to claim 3, wherein the global function value is determined based on at least one of a reference function value and a medical penalty function value for each two diagnosis names, and a synonym transfer penalty function value, a similarity penalty function value and a similarity threshold penalty function value for each two diagnosis names;
the synonym transfer penalty function value of any two diagnosis names is determined based on the candidate synonym relation between any two diagnosis names and the rest diagnosis names;
the similarity penalty function value of any two diagnosis names is determined based on the candidate synonym relation and the similarity between any two diagnosis names;
the similarity threshold penalty function value of any two diagnosis names is determined based on the candidate synonym relationship and the similarity between any two diagnosis names and a preset similarity threshold.
6. The method according to claim 5, wherein when the candidate synonym relationship between any two diagnostic names is negative and there are a plurality of transfer diagnostic names having the same candidate synonym relationship with the two diagnostic names, the synonym transfer penalty function value of any two diagnostic names is the maximum value of the penalty score corresponding to each transfer diagnostic name;
the penalty score corresponding to any transfer diagnosis name is determined based on the similarity between any transfer diagnosis name and any two diagnosis names.
7. The method according to claim 1, wherein the determining the standardized diagnosis name corresponding to each diagnosis name based on the final synonym relationship between every two diagnosis names specifically comprises:
determining a plurality of synonym diagnostic name sets based on the final synonym relationship between every two diagnostic names;
if a plurality of bridge diagnosis names exist in any synonym diagnosis name set, taking the bridge diagnosis name with the highest similarity score as the standardized diagnosis name of any synonym diagnosis name set;
otherwise, taking the diagnosis name with the highest similarity score in any synonym diagnosis name set as the standardized diagnosis name of any synonym diagnosis name set;
wherein the similarity score for any diagnostic name is determined based on the similarity between said any diagnostic name and the diagnostic name for which each final synonym relationship is yes.
8. A diagnostic name normalization apparatus, comprising:
a diagnosis name determination unit for determining a plurality of diagnosis names;
the synonym relation determining unit is used for adjusting the candidate synonym relation between every two diagnosis names based on the similarity and the medical relation between every two diagnosis names in the plurality of diagnosis names to obtain the final synonym relation between every two diagnosis names;
and the normalization unit is used for determining the normalized diagnosis name corresponding to each diagnosis name based on the final synonym relation between every two diagnosis names.
9. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor, when executing the program, implements the steps of the method of standardizing a diagnostic name as claimed in any one of claims 1 to 7.
10. A non-transitory computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method of standardizing a diagnostic name according to any one of claims 1 to 7.
CN202010151747.2A 2020-03-06 2020-03-06 Diagnostic name standardization method, device, electronic equipment and storage medium Active CN111428477B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010151747.2A CN111428477B (en) 2020-03-06 2020-03-06 Diagnostic name standardization method, device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010151747.2A CN111428477B (en) 2020-03-06 2020-03-06 Diagnostic name standardization method, device, electronic equipment and storage medium

Publications (2)

Publication Number Publication Date
CN111428477A true CN111428477A (en) 2020-07-17
CN111428477B CN111428477B (en) 2023-10-17

Family

ID=71547605

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010151747.2A Active CN111428477B (en) 2020-03-06 2020-03-06 Diagnostic name standardization method, device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN111428477B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112732681A (en) * 2021-04-01 2021-04-30 壹药网科技(上海)股份有限公司 Data platform migration method and system

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2009075661A (en) * 2007-09-18 2009-04-09 Fuji Xerox Co Ltd Similar disease name selection apparatus
JP2013033502A (en) * 2012-10-25 2013-02-14 Fuji Xerox Co Ltd Disease name selection apparatus
CN109697286A (en) * 2018-12-18 2019-04-30 众安信息技术服务有限公司 A kind of diagnostic standardization method and device based on term vector
CN109994215A (en) * 2019-04-25 2019-07-09 清华大学 Disease automatic coding system, method, equipment and storage medium
CN110032728A (en) * 2019-02-01 2019-07-19 阿里巴巴集团控股有限公司 The standardized conversion method of disease name and device

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2009075661A (en) * 2007-09-18 2009-04-09 Fuji Xerox Co Ltd Similar disease name selection apparatus
JP2013033502A (en) * 2012-10-25 2013-02-14 Fuji Xerox Co Ltd Disease name selection apparatus
CN109697286A (en) * 2018-12-18 2019-04-30 众安信息技术服务有限公司 A kind of diagnostic standardization method and device based on term vector
CN110032728A (en) * 2019-02-01 2019-07-19 阿里巴巴集团控股有限公司 The standardized conversion method of disease name and device
CN109994215A (en) * 2019-04-25 2019-07-09 清华大学 Disease automatic coding system, method, equipment and storage medium

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112732681A (en) * 2021-04-01 2021-04-30 壹药网科技(上海)股份有限公司 Data platform migration method and system
CN112732681B (en) * 2021-04-01 2021-06-08 壹药网科技(上海)股份有限公司 Data platform migration method and system

Also Published As

Publication number Publication date
CN111428477B (en) 2023-10-17

Similar Documents

Publication Publication Date Title
WO2020124856A1 (en) Diagnosis standardization method and device based on word vectors
US11956272B2 (en) Identifying legitimate websites to remove false positives from domain discovery analysis
WO2021012637A1 (en) Intelligent follow-up method and system based on radiographic image report, device, and storage medium
US8438182B2 (en) Patient identification
WO2018000770A1 (en) Medical assistance system and method based on clinical pathway
WO2020082788A1 (en) Medical data processing method, apparatus and device, and storage medium
CN108682457B (en) Patient long-term prognosis quantitative prediction and intervention system and method
WO2020082804A1 (en) Medical data classified storage method and apparatus
WO2020119097A1 (en) Data standardization processing method and device, and storage medium
WO2021180245A1 (en) Server, data processing method and apparatus, and readable storage medium
WO2020042503A1 (en) Verification method and apparatus for risk management system, and device and storage medium
WO2019080502A1 (en) Voice-based disease prediction method, application server, and computer readable storage medium
CN111968750B (en) Server, data processing method, data processing device and readable storage medium
WO2019075972A1 (en) Method and apparatus for generating no claims based discount data
WO2023124837A1 (en) Inquiry processing method and apparatus, device, and storage medium
CN111428477A (en) Diagnostic name standardization method, device, electronic equipment and storage medium
CN111724269A (en) Machine learning-based settlement data processing method and device
WO2020087971A1 (en) Prediction model-based hospitalization rationality prediction method and related products
Escobar-Bach et al. Nonparametric estimation of conditional cure models for heavy-tailed distributions and under insufficient follow-up
US20200311390A1 (en) Face recognition method and apparatus, server, and storage medium
CN116246749A (en) Endocrine patient personalized health management system integrating electronic medical records
CN115762704A (en) Prescription auditing method, device, equipment and storage medium
WO2019084864A1 (en) Method and apparatus for evaluating electronic medical record
CN111462895A (en) Auxiliary diagnosis method and system
CN110570943A (en) method and device for intelligently recommending MDT (minimization of drive test) grouping, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information
CB02 Change of applicant information

Address after: 230088 floor 23-24, building A5, No. 666, Wangjiang West Road, high tech Zone, Hefei, Anhui Province

Applicant after: Anhui Xunfei Medical Co.,Ltd.

Address before: 230088 room 288, building H2, phase II, innovation industrial park, 2800 innovation Avenue, high tech Zone, Hefei City, Anhui Province

Applicant before: ANHUI IFLYTEK MEDICAL INFORMATION TECHNOLOGY CO.,LTD.

CB02 Change of applicant information
CB02 Change of applicant information

Address after: 230088 floor 23-24, building A5, No. 666, Wangjiang West Road, high tech Zone, Hefei, Anhui Province

Applicant after: IFLYTEK Medical Technology Co.,Ltd.

Address before: 230088 floor 23-24, building A5, No. 666, Wangjiang West Road, high tech Zone, Hefei, Anhui Province

Applicant before: Anhui Xunfei Medical Co.,Ltd.

GR01 Patent grant
GR01 Patent grant