CN116312915A - Method and system for standardized association of drug terms in electronic medical records - Google Patents

Method and system for standardized association of drug terms in electronic medical records Download PDF

Info

Publication number
CN116312915A
CN116312915A CN202310567874.4A CN202310567874A CN116312915A CN 116312915 A CN116312915 A CN 116312915A CN 202310567874 A CN202310567874 A CN 202310567874A CN 116312915 A CN116312915 A CN 116312915A
Authority
CN
China
Prior art keywords
drug
term
medicine
terms
electronic medical
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202310567874.4A
Other languages
Chinese (zh)
Other versions
CN116312915B (en
Inventor
李劲松
马爽
杨宗峰
王昱
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang Lab
Original Assignee
Zhejiang Lab
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang Lab filed Critical Zhejiang Lab
Priority to CN202310567874.4A priority Critical patent/CN116312915B/en
Publication of CN116312915A publication Critical patent/CN116312915A/en
Application granted granted Critical
Publication of CN116312915B publication Critical patent/CN116312915B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H10/00ICT specially adapted for the handling or processing of patient-related medical or healthcare data
    • G16H10/60ICT specially adapted for the handling or processing of patient-related medical or healthcare data for patient-specific data, e.g. for electronic patient records
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • G06F16/353Clustering; Classification into predefined classes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/284Lexical analysis, e.g. tokenisation or collocates
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • G06F40/295Named entity recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02ATECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
    • Y02A90/00Technologies having an indirect contribution to adaptation to climate change
    • Y02A90/10Information and communication technologies [ICT] supporting adaptation to climate change, e.g. for weather forecasting or climate simulation

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • General Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Public Health (AREA)
  • Biophysics (AREA)
  • Medical Informatics (AREA)
  • Epidemiology (AREA)
  • Databases & Information Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Primary Health Care (AREA)
  • Evolutionary Computation (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Medical Treatment And Welfare Office Work (AREA)

Abstract

The invention discloses a method and a system for standardized association of drug terms in electronic medical records, wherein a drug term library is updated through a synonym mining technology to obtain a drug term library based on synonym mining update, so that the problem of low semantic similarity between standard drug terms in the drug term library and external drug terms in the electronic medical records is solved; when the external medicine terms in the electronic medical record are associated with standard medicine terms in the medicine term library based on synonym mining update, the semantic information is utilized to add pinyin character sequences of corresponding medicine terms except Chinese character marks, and the diagram structure information of the external medicine terms in the medicine term library and the electronic medical record is fully utilized; an association prediction model based on semantic embedding and structural embedding is constructed, so that the association between external medicine terms and standard medicine terms in a medicine term library in the real-world electronic medical record is accurately established.

Description

Method and system for standardized association of drug terms in electronic medical records
Technical Field
The invention belongs to the technical field of medical information, and particularly relates to a method and a system for standardized association of drug terminology in an electronic medical record.
Background
With the development of information technology and its continuous deep application in the medical health industry, a large amount of data is stored in the medical health industry. Among these, it is typical to include a Knowledge Base (KB) presented in a relatively standardized form and electronic health medical records (Electronic health records, EHRs) presented in the form of real world medical procedure data. Knowledge bases are techniques used by computer systems to store complex structured and unstructured information, where a Term Base (TB) is a special type of knowledge base used to store term concepts and their related information, and in the field of drug research, general drug term bases that have been built and are still being updated include drug banks and whodrugs, etc., and in academia and industry, there is also a need for a chinese drug term base that is built. However, since in the practice of real-world clinical practice, different areas, even different hospitals, different doctors, may use a variety of different names to represent the same drug, the existing drug terminology library does not have to record all the names of the drugs. For example, a drug with drug id=db 00736, the english name of drug is "Esomeprazole magnesium", the chinese-zehng name is "esomeprazole magnesium", the current name is "esomeprazole magnesium", and in the electronic health medical record system, the drug may be recorded as "esomeprazole magnesium" before the modification of the drug common name, and the drug may be recorded as "esomeprazole magnesium" after the modification of the drug common name, when the electronic health medical record data is used for developing the real world drug study, if any one of the names is missed, the data will be not fully retrieved, thereby resulting in unreasonable screening of the study population, misreckoning of the drug condition and ultimately affecting the study quality. Therefore, when using EHRs data to conduct real-world drug research, especially multi-center, real-world drug research involving multiple drugs, it is necessary to correlate drug names in EHRs with corresponding drugs in a drug terminology library, which is also an important precondition for ensuring the quality of the research and the reliability of the results. The medicine term library is used as important information in the medical research and engineering fields, timely updating of the medicine term library is the basis for promoting information communication and even technical progress in the field, and the medicine term library is associated with real-world electronic health medical record data, so that the medicine term library can provide bottom support for research and engineering tasks in aspects of natural language processing, artificial intelligence, expert system, real-world medicine research and the like based on EHRs and has promotion and promotion effects.
In the existing medicine association method, a medical standard term management system and method (publication number is CN 115080751A) based on a general model relate to mapping of medical record texts and standard terms, firstly, text subdivision attributes are obtained by splitting the medical record texts based on a sequence labeling model, then similarity between the medical record texts and any semantic standard word is calculated, validity of standardized mapping is judged through the semantic similarity, if the standardized mapping is valid, the standardized mapping is directly used as a mapping result, if the standardized mapping is invalid, other possible standardized mapping is recalculated, and finally, the standardized mapping result is used as a mapping result recommended by an algorithm and needing manual examination. However, the technical scheme only uses semantic similarity to judge the validity of the mapping, and ignores the structural characteristics of the drug term library.
A method and a device for matching medicine names (publication number is CN 112711642A) relate to medicine matching among different electronic medical records, word vectors of an electronic medical record corpus are obtained through electronic medical record data training, medicine names are extracted based on a unified medical language system, medicine entity word vectors are obtained, a neural network model is adopted to obtain component vectors, meanwhile, engineering characteristics are combined, similarity among medicine entities is calculated, and finally medicine matching among different electronic medical record systems is achieved. According to the technical scheme, under the condition that the unified medical language system is perfect, the problem of medicine matching among different electronic health medical record data is solved, and the problem of matching medicine terms in the electronic health medical record into a medicine term library, which is to be solved by the invention, can be referred to with limited value.
The drug information matching method and system (publication number 107103048B) relate to matching among drugs, firstly, sub-information of multiple dimensions of the drugs to be matched, such as drug names, preparation specifications, dosage forms and the like, are obtained, association degree identification is carried out on target sub-information and standard sub-information, and when an association degree identification result meets preset association requirements, the target information meeting the preset association requirements and one or more standard information are respectively configured into one or more candidate information pairs; and calculating the similarity of the target information and the standard information on the sub-information of the multiple dimensions for each candidate information pair, calculating the comprehensive matching score of each candidate information pair based on the calculated similarity, and finally determining the standard information of the candidate information pair corresponding to the maximum comprehensive matching score as the matching information of the target information. Medical drug matching methods, devices, electronic equipment and storage media (publication number CN111798969 a) relate to matching between a target drug and a drug standard library, and the method of the application comprises: for the target medicine to be matched, selecting a plurality of medicine identifications or specifications for representing the target medicine from medicine information as reference items, assigning a weight value to each reference item according to importance, matching the reference item with standard items in a medicine reference library, calculating a comparison value, and calculating the matching degree of the target medicine and the medicine in the medicine reference library according to the comparison value and the weight value, so as to establish a mapping relation between the medicine identifications of the target medicine and the standard identifications of the target medicine stored in the medicine reference library. The two technical schemes solve the problem of matching the target medicine with the medicine standard library, compared with medicines, the medicines contain more sub-information and also contain multi-dimensional information such as preparation specification, dosage form, manufacturer, approval document and the like besides medicine names, but the method is not applicable because the problem to be solved by the invention is related to medicines and the available text information is limited.
The limitations of the prior art are mainly reflected in: only semantic similarity is utilized in the association process, and the graph structure information is not utilized; in the association, the semantic similarity does not use pinyin information, and because the medicine names may have different words and the same pronunciation, if the semantic similarity of the Chinese names is simply used, for example, "cefradine" and "cefradine" may be calculated to have similar similarity, but from the pinyin, it is obvious that "cefradine" and "cefradine" are the same medicine, so that the association result is inaccurate.
Disclosure of Invention
Aiming at the defects of the prior art, the invention provides a standardized association method and a standardized association system for drug terms in electronic medical records, which realize the association between external drug terms and standard drug terms in a drug term library in the electronic medical records.
The invention aims at realizing the following technical scheme:
according to a first aspect of the present specification, there is provided a method for standardized association of drug terms in an electronic medical record, including:
s1, inputting a drug term library to obtain a synonym set of each standard drug term;
s2, obtaining a drug term library based on synonym mining update, comprising:
Constructing a corpus used for synonym mining, and acquiring a drug term list from the corpus;
training a synonym set classifier to obtain a classification prediction result of each drug term in the drug term list and the synonym set in the drug term library, and obtaining all synonym sets based on synonym mining update according to a preset probability threshold;
updating the drug term library according to all synonym sets based on synonym mining update;
s3, training a correlation prediction model based on semantic embedding and structural embedding according to the updated drug term library and external drug terms in the electronic medical record, wherein the method comprises the following steps:
the semantic embedded representation of standard drug term pairs in an external drug term and updated drug term library in the electronic medical record is obtained through a pre-training language model, and specifically comprises the following steps: the external medicine terms and their pinyin character sequences, the standard medicine terms and their pinyin character sequences are combined with the initial characters and the separation characters to form the related medicine term pair character sequences, and the related medicine term pair character sequences are input into a pre-training language model to obtain semantic embedded representations;
obtaining the structure embedded representation of the external medicine term and the standard medicine term pair in the updated medicine term library in the electronic medical record through a graph convolution neural network model, wherein the structure embedded representation specifically comprises the following steps: establishing candidate association relation between the external medicine term and the medicine term in the updated medicine term library based on similarity calculation, respectively taking semantic embedded representations of the external medicine term and the medicine term in the updated medicine term library as initialized node embedded representations of corresponding medicine terms, inputting a graph convolutional neural network model to obtain node embedded representations of the corresponding medicine terms, and taking the product of the node embedded representations of the external medicine term and the standard medicine term as a structure embedded representation;
S4, predicting and obtaining a correlation result of the external drug term and the standard drug term in the drug term library in the electronic medical record by using the correlation prediction model.
Further, in the training process of the synonym set classifier, the probability that the drug term to be classified belongs to the synonym set is predicted based on the change of the set uniformity score, and the calculation method of the set uniformity score is as follows: calculating an embedded representation of each term in the set, inputting the embedded representation into the fully-connected neural network model to obtain a new term representation, summing all the new term representations to obtain an initialized term set representation, and inputting the initialized term set representation into the fully-connected neural network model to obtain a set uniformity score.
Further, the training set generation mode of the synonym set classifier comprises the following steps: extracting a drug term from the synonym set in a random extraction mode, and generating a positive training sample by combining a set formed by the rest drug terms in the synonym set; for each positive training sample, matching a plurality of negative training samples, the negative training samples being extracted from a drug term library after the drug terms in the synonym set are excluded.
Further, updating the drug term library according to all synonym sets based on synonym mining update, specifically: if the synonym set of the standard drug term as the upper language is updated, establishing synonym association between the corresponding synonym and the standard drug term as the upper language, and simultaneously establishing lower language association between the corresponding synonym and all the standard drug terms associated with the standard drug term as the upper language; if the synonym set of the standard medicine terms of the non-upper languages is updated, corresponding synonyms are associated with the standard medicine terms of the non-upper languages.
Further, the pre-training language model is adjusted, specifically: the semantic embedded representation of the initial character is used as an independent variable, and the dependent variable is a label of whether the semantic association of the external medicine term in the electronic medical record and the medicine term in the updated medicine term library is carried out; acquiring a prediction result based on semantic embedded representation by adopting a nonlinear activation function; and (3) acquiring a positive training sample by adopting a manual labeling mode, acquiring a negative training sample by adopting a random extraction mode, acquiring a training set, training a pre-training language model, and optimizing and adjusting semantic embedding representation.
Further, the candidate association relationship is established between the external drug term and the drug term in the updated drug term library, specifically: calculating the TF-IDF value of each word in the medicine term, obtaining vector representation of the external medicine term in the electronic medical record and each medicine term in the updated medicine term library, calculating the similarity between the two medicine terms, and if the similarity is larger than a preset similarity threshold, establishing a candidate association relationship between the external medicine term in the electronic medical record and the medicine term in the corresponding medicine term library.
Further, the input of each layer in the graph rolling neural network model comprises two parts, the first part is a node embedded representation matrix, the second part is an adjacent matrix, the output of each layer is used as a node embedded representation matrix of the next layer, the graph rolling neural network model is obtained through normalized graph Laplace transformation, and the graph rolling neural network model is optimized by adopting a distance loss function based on a marginal.
Further, the value of the adjacency matrix specifically includes: if there is an edge from one drug term to another in the updated drug term library, the corresponding value is 1, otherwise the value is 0; if there is an edge from the external drug term in the electronic medical record to the updated drug term in the drug term library, the corresponding value is the similarity value in the candidate association.
Further, the semantic embedded representation and the structural embedded representation are spliced, the spliced representation is input into a multi-layer perceptron, the multi-layer perceptron comprises a plurality of fully-connected hidden layers and a single-node output layer, the output of the multi-layer perceptron is converted into a scalar through a nonlinear activation function, and the association probability of external medicine terms in each electronic medical record and standard medicine terms in the updated medicine term library is obtained.
According to a second aspect of the present specification, there is provided a system for standardizing association of drug terms in an electronic medical record, comprising:
the electronic medical record drug term input module is used for acquiring all external drug terms to be subjected to drug term standardization in the electronic medical record;
the system comprises a drug term library synonym mining updating module, a database processing module and a database processing module, wherein the drug term library synonym mining updating module is used for constructing a corpus and acquiring a drug term list for synonym mining from the corpus; obtaining a synonym set of each standard drug term in the drug term library; training a synonym set classifier to obtain a classification prediction result of each drug term in the drug term list and the synonym set in the drug term library, obtaining all synonym sets based on synonym mining update according to a preset probability threshold, and updating the drug term library;
The candidate association relation establishing module is used for establishing candidate association relation between external medicine terms in the electronic medical record and the medicine terms in the updated medicine term library based on similarity calculation;
the semantic embedded representation module is used for inputting external medicine terms and pinyin character sequences thereof in the electronic medical record, standard medicine terms and pinyin character sequences thereof in the updated medicine term library, combining initial characters and separation characters to form a related medicine term pair character sequence, and inputting a pre-training language model to obtain semantic embedded representation;
the structure embedding representation module is used for respectively taking semantic embedding representations of external medicine terms and updated medicine terms in the medicine term library in the electronic medical record as initialized node embedding representations of the corresponding medicine terms, inputting a graph convolutional neural network model to obtain node embedding representations of the corresponding medicine terms, and taking the product of the node embedding representations of the external medicine terms and the standard medicine terms as the structure embedding representation;
the association prediction module is used for training an association prediction model based on semantic embedding and structural embedding according to the updated drug term library and external drug terms in the electronic medical record; and predicting and obtaining a correlation result of the external drug term and the standard drug term in the drug term library in the electronic medical record by using the correlation prediction model.
The beneficial effects of the invention are as follows: the invention enriches semantic information and graph structure information through synonym mining technology; in the prediction of the associated prediction model, semantic information and graph structure information are simultaneously used; the concrete steps are as follows:
1) Updating the drug term library by a synonym mining technology to obtain a drug term library based on synonym mining updating, so that the problem of low semantic similarity between standard drug terms in the drug term library and external drug terms in the electronic medical record is solved;
2) When external medicine terms in the electronic medical record are associated with standard medicine terms in a medicine term library which is mined and updated based on synonyms, the semantic information is utilized to add pinyin marks of corresponding terms besides Chinese character marks;
3) When the external medicine terms in the electronic medical record are associated with standard medicine terms in the medicine term library which is mined and updated based on synonyms, the diagram structure information of the medicine term library is fully utilized;
4) When the external medicine terms in the electronic medical record are associated with standard medicine terms in the medicine term library based on synonym mining update, the graph structure information of the external medicine terms in the electronic medical record is obtained by associating the external medicine terms in the electronic medical record with the medicine terms in the medicine term library;
5) Through the method, embedded representation information of external drug terms and standard drug term pairs in a drug term library based on synonym mining update in the electronic medical record is finally obtained, and prediction of the association prediction model is carried out.
Drawings
FIG. 1 is a flowchart illustrating overall steps of a method for standardized association of drug terminology in an electronic medical record according to an exemplary embodiment;
FIG. 2 is a flowchart of a method for standardized association of drug terminology in electronic medical records provided in an exemplary embodiment;
FIG. 3 is a schematic diagram of a library of raw drug terms provided by an exemplary embodiment;
FIG. 4 is a diagram of a drug terminology library based on synonym mining updates, as provided by an example embodiment;
FIG. 5 is a block diagram of a system for standardizing association of drug terminology in electronic medical records provided in an exemplary embodiment.
Detailed Description
In order that the above objects, features and advantages of the invention will be readily understood, a more particular description of the invention will be rendered by reference to the appended drawings.
In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present invention, but the present invention may be practiced in other ways other than those described herein, and persons skilled in the art will readily appreciate that the present invention is not limited to the specific embodiments disclosed below.
As shown in fig. 1 and fig. 2, the method for standardized association of drug terms in electronic medical records provided by the embodiment of the invention includes the following steps:
step S1: inputting a drug term library, the drug term library being expressed as
Figure SMS_1
Wherein E represents a collection of drug terms, including standard drug terms as upper or non-upper terms, synonyms for standard drug termsR represents a set of relationships between drug terms, in particular, < >>
Figure SMS_2
Wherein->
Figure SMS_3
Representing the lower order relation->
Figure SMS_4
Representing synonym relationships, relationships between drug terms may be expressed as
Figure SMS_5
The term h represents a synonym for t, or
Figure SMS_6
The lower phrase representing the drug term h is t; the relation in the drug term library is +.>
Figure SMS_7
The drug terms of (2) are converted into synonym sets, and the synonym sets of each standard drug term are obtained;
FIG. 3 is an example of a library of drug terms, wherein standard drug terms include esomeprazole, esomeprazole strontium hydrate, esomeprazole sodium, esomeprazole magnesium dihydrate, esomeprazole magnesium trihydrate, gadofacic, gadofemetic acid meglumine, latamoxef, and latamoxef sodium; the lower language of esomeprazole includes esomeprazole strontium, esomeprazole strontium hydrate, esomeprazole sodium, esomeprazole magnesium dihydrate and esomeprazole magnesium trihydrate, and the relationship can be expressed as
Figure SMS_8
The method comprises the steps of carrying out a first treatment on the surface of the Synonyms for Laxef include Laxef, lower phrases include Laxef sodium; the lower language of gadopentetic acid includes gadopentetic acid meglumine, the synonyms of gadopentetic acid meglumine include gadopentetic acid meglumine and gadopentetic acid meglumine, and the relationship of gadopentetic acid meglumine and gadopentetic acid meglumine can be expressedIs that
Figure SMS_9
The method comprises the steps of carrying out a first treatment on the surface of the Standard pharmaceutical terminology is the target association object of the present embodiment.
Step S2: aiming at the problem of incomplete synonym relation caused by the problems of irregular translation, inaccurate manual labeling, untimely data updating and the like of an original drug term library, adopting a synonym mining method to perfect the synonym relation of the drug term library, and obtaining the drug term library based on synonym mining updating; the method specifically comprises the following substeps:
step S21: obtaining Chinese abstract and text of medicine related Chinese document from Chinese knowledge network, mastership and other document retrieval platforms to form synonym mining corpus
Figure SMS_10
And acquiring the drug terms for synonym mining by using a named entity recognition method, expressed as a drug term list +.>
Figure SMS_11
Wherein->
Figure SMS_12
Represents the i-th drug term,/->
Figure SMS_13
Representation->
Figure SMS_14
The number of drug terms, named entity recognition method adopted in this embodiment is a conditional random field model;
All synonym sets in the drug term library are noted as
Figure SMS_15
Wherein->
Figure SMS_16
Synonym set for j-th standard drug term,/->
Figure SMS_17
Representation->
Figure SMS_18
Number of synonym sets in->
Figure SMS_19
Equal to the number of all standard drug terms in a drug term library, if a standard drug term has no synonyms, its corresponding set of synonyms contains only 1 element, i.e., the standard drug term.
Step S22: training a synonym set classifier to obtain a drug term list
Figure SMS_20
Classification prediction results of synonym sets in the drug term library and each drug term in the drug term library; the method specifically comprises the following substeps:
step S221: representing a synonym set classifier as
Figure SMS_21
Wherein->
Figure SMS_22
Represents a synonym set,/->
Figure SMS_23
Representing a term of the drug to be categorized into a synonym set;
step S222: predicting drug terms to be categorized based on aggregate uniformity score variation
Figure SMS_24
Belongs to synonym set->
Figure SMS_25
The formula can be expressed as +.>
Figure SMS_26
Where Pr represents the probability that the probability,
Figure SMS_27
activating a function for sigmoid->
Figure SMS_28
A score function for aggregate unity;
step S223: for a group of data
Figure SMS_32
Wherein->
Figure SMS_36
Representing synonym set ++>
Figure SMS_39
And to be categorized into synonym sets->
Figure SMS_31
In the medicine term->
Figure SMS_34
,/>
Figure SMS_38
Indicating label->
Figure SMS_41
Representation->
Figure SMS_30
,/>
Figure SMS_33
Representation->
Figure SMS_37
,/>
Figure SMS_40
,/>
Figure SMS_29
The method comprises the steps of carrying out a first treatment on the surface of the In this embodiment, the synonym set classifier uses a fully connected neural network model, and the loss function of the synonym set classifier uses a logarithmic loss function, which is specifically expressed in the following form:
Figure SMS_35
In one embodiment, the aggregate uniformity score
Figure SMS_42
Estimated by:
first for a term set
Figure SMS_45
Is->
Figure SMS_47
Calculating its embedded representation using text embedding method>
Figure SMS_51
As an initialization parameter of the embedding layer, the embedding layer is input, and the text embedding method used in this embodiment is Word2Vec; then the embedded representation is input into the fully connected neural network model to obtain a new term representation of the corresponding term +.>
Figure SMS_44
The term set->
Figure SMS_48
All new terms corresponding to ++>
Figure SMS_50
Taking the mean value after addition to ensure the substitution invariance, obtaining the initialized term set representation, namely +.>
Figure SMS_53
The method comprises the steps of carrying out a first treatment on the surface of the Finally, the above term set is denoted +.>
Figure SMS_43
Inputting the full-connection neural network model to obtain a final term set +.>
Figure SMS_46
Uniformity score->
Figure SMS_49
For measuring the aggregate->
Figure SMS_52
The degree of similarity of all terms herein.
In one embodiment, the training set is generated by:
first, the relation in the drug term library is as follows
Figure SMS_55
Is converted into a set of synonyms, all the sets of synonyms in the drug term library are denoted +.>
Figure SMS_59
The method comprises the steps of carrying out a first treatment on the surface of the Each synonym set is denoted +.>
Figure SMS_62
Wherein->
Figure SMS_56
Representing each drug term in the set of synonyms; extracting any one of the drug terms from the set ES by random extraction >
Figure SMS_58
The remaining terms in the set ES constitute the set
Figure SMS_60
Obtaining a positive sample for model training +.>
Figure SMS_63
Label y=1; for each positive sample, K negative samples were matched, denoted +.>
Figure SMS_54
The label y=0, in this example K takes 5, where +.>
Figure SMS_57
Can be obtained by extracting from a drug term library after the drug terms in the synonym set ES are removed, specifically, can be obtained by mixing samples extracted in the following two ways according to a set proportion: (1) extracted by means of completely random samplingTo (3) the point; (2) limiting the sampling range to the AND set +.>
Figure SMS_61
The medicine terms in the medicine are extracted by adopting a random sampling mode, wherein the medicine terms contain the medicine terms with the same characters; in this example, the ratio was set to 2:3.
Step S224: model training is carried out by utilizing a training set, and a medicine term list is predicted based on the trained model
Figure SMS_64
Probability that each drug term in a database of drug terms belongs to each synonym set;
specifically, before synonym mining is performed, all synonym sets in the drug term library are represented as
Figure SMS_69
The method comprises the steps of carrying out a first treatment on the surface of the For->
Figure SMS_66
Is of the term->
Figure SMS_75
Calculate it to be->
Figure SMS_70
Arbitrary synonym set ++>
Figure SMS_77
Is expressed as +.>
Figure SMS_72
Taking the maximum probability as the drug term +. >
Figure SMS_79
Setting a probability threshold +.>
Figure SMS_68
If the probability maximum is greater than the probability threshold +.>
Figure SMS_78
The corresponding synonym set will be updated if the probability maximum is less than or equal to the probability threshold +.>
Figure SMS_65
The above medicine term->
Figure SMS_74
Put back to +.>
Figure SMS_71
The method comprises the steps of carrying out a first treatment on the surface of the Starting the next cycle until
Figure SMS_80
The probability that all the drug terms belong to any synonym set is smaller than or equal to the probability threshold +.>
Figure SMS_73
Finally, all synonym sets based on synonym mining update are obtained, expressed as +.>
Figure SMS_76
The method comprises the steps of carrying out a first treatment on the surface of the In this embodiment ∈>
Figure SMS_67
Step S23: updating the drug term library according to all synonym sets based on synonym mining update to obtain a drug term library based on synonym mining update;
specifically, if there is an update to the synonym set that is the standard drug term in the upper-level language, the corresponding synonym is associated with the standard drug term in the upper-level language
Figure SMS_81
In addition, corresponding synonyms are simultaneously associated with all non-upper-level standard drug terms associated with the standard drug terms as upper-level terms to establish lower-level associations, and the relationship is expressed as
Figure SMS_82
. Standard drugs in non-upper languagesThe synonym set of the terms is updated, corresponding synonyms and the standard drug terms of the non-upper language are established to form synonym association, and the relationship is expressed as
Figure SMS_83
. In the example of FIG. 4, updated synonym sets are mined based on synonyms: [ { Esomeprazole, esomeprazole }, { Esomeprazole sodium, esomeprazole sodium }, …]The synonym esomeprazole is associated with the standard drug term esomeprazole as the upper-level language, and the following standard drug terms are associated with the standard drug term esomeprazole as the upper-level language: esomeprazole strontium, esomeprazole strontium hydrate, esomeprazole sodium, esomeprazole magnesium dihydrate, esomeprazole magnesium trihydrate; establishing synonym association between esomeprazole sodium and esomeprazole sodium; and finally obtaining the updated drug term library based on synonym mining.
Step S3: training a related prediction model based on semantic embedding and structural embedding according to the medicine term library based on synonym mining updating and external medicine terms in the real-world electronic medical record data; the method specifically comprises the following substeps:
step S31: acquiring semantic embedded representations of external drug terms in the electronic medical record and standard drug term pairs in a drug term library based on synonym mining update through a pre-training language model, wherein the pre-training language model adopts a BERT model in the embodiment;
In particular, the set of external drug terms in the real-world electronic medical record is represented as
Figure SMS_87
Standard drug term set in the updated drug term library based on synonym mining is denoted +.>
Figure SMS_90
The external medicine term set G in the real world electronic medical record is arbitrary external medicine term +.>
Figure SMS_93
The Pinyin character sequence is expressed as
Figure SMS_86
Mining any standard drug term in the standard drug term set E in the updated drug term library based on synonyms +.>
Figure SMS_88
Is expressed as +.>
Figure SMS_91
Will->
Figure SMS_94
Incorporating start character [ CLS ]]And separating characters [ SEP ]]The character sequence of the related medicine term pair is marked as +.>
Figure SMS_84
To->
Figure SMS_89
Is 'Esomeprazole sodium'>
Figure SMS_92
For the example of "sodium esomeprazole", the related drug term pair character sequence can be expressed as { [ CLS ]][ moxa ]][ sauce ]][ ao][ Mei ]][ Lala ]][ azoles ]][ sodium ]][ai][si][ao][mei][la][zuo][na][SEP][ Angstrom ]][ rope ]][ Mei ]][ Lala ]][ azoles ]][ sodium ]] [ai][suo] [mei][la][zuo][na] [SEP]-a }; in the embodiment, a BERT model pre-trained based on a Chinese corpus is adopted, semantic embedded representation of the related drug terms on a character sequence is obtained through a plurality of bi-directional coding layers of a Transformer, and finally a starting character [ CLS ] is utilized]Is represented by a semantic embedded representation of +.>
Figure SMS_95
And->
Figure SMS_85
Is a relationship of (a) and (b).
In this embodiment, the BERT model is adjusted, specifically: will start character Is expressed as an argument and is noted as
Figure SMS_97
The dependent variable is the external drug term +.>
Figure SMS_99
And mining standard drug terms in the updated drug term library based on synonyms +.>
Figure SMS_102
Label of semantic association or not->
Figure SMS_98
If a semantic association exists
Figure SMS_100
Otherwise->
Figure SMS_103
The method comprises the steps of carrying out a first treatment on the surface of the By means of a nonlinear activation function>
Figure SMS_105
Obtaining a prediction result based on the BERT semantic embedded representation +.>
Figure SMS_96
In this embodiment, a sigmoid activation function is used, expressed as
Figure SMS_101
The loss function uses a two-class cross entropy loss function, expressed as
Figure SMS_104
The method comprises the steps of carrying out a first treatment on the surface of the And (3) acquiring a positive training sample by adopting a manual labeling mode, acquiring a negative training sample by adopting a random extraction mode, obtaining a training set, performing BERT model training, and optimally adjusting BERT semantic embedded representation.
Step S32: obtaining external drug terms in the electronic medical record and structural embedded representations of standard drug term pairs in a drug term library based on synonym mining update through a graph convolution neural network model;
specifically, candidate association relations are established between external drug terms in the real-world electronic medical record and drug terms in a drug term library based on synonym mining update: calculating TF-IDF value of each word in each medicine term, further obtaining vector representation of external medicine term in electronic medical record and each medicine term in medicine term library based on synonym mining update, calculating similarity before using cosine similarity, setting similarity threshold, if the similarity is larger than the similarity threshold, establishing candidate association relationship between external medicine term in electronic medical record and medicine term in corresponding medicine term library, and representing as
Figure SMS_106
As shown, for example, in fig. 4, esomeprazole establishes a candidate association with esomeprazole magnesium, which establishes a candidate association with esomeprazole magnesium.
Converting external drug terms in electronic medical records into sequences
Figure SMS_107
Converting drug terms in a drug term library based on synonym mining update into the sequence +.>
Figure SMS_108
The BERT model trained in the step S31 is utilized to calculate the semantic embedded representation of the sequence, and the initial character [ CLS ] is obtained]The corresponding semantic embedded representation is an initialized node embedded representation of the corresponding medication term;
embedding the initialization node into a convolutional neural network model representing an input graph, specifically, the convolutional neural network model comprises an L layer, in this embodiment, l=10, wherein the input of the first layer comprises two parts, and the first part is
Figure SMS_109
Node embedded representation matrix of a dimension
Figure SMS_110
Wherein n represents the number of nodes, which isSum of external drug terms and drug terms in drug term library based on synonym mining update in electronic medical record, ++>
Figure SMS_111
The node representing the first layer embeds the representation dimension, the second part is +>
Figure SMS_112
The adjacent matrix A of the dimension, the output of the first layer is used as the node embedded representation matrix of the first layer +1, and the node embedded representation matrix is obtained through normalized graph Laplace transformation, and the formula is as follows:
Figure SMS_113
Wherein the method comprises the steps of
Figure SMS_114
For a nonlinear activation function, a sigmoid activation function can be used, < >>
Figure SMS_115
I is an identity matrix>
Figure SMS_116
For diagonal matrix, the value of the element on the diagonal is +.>
Figure SMS_117
,/>
Figure SMS_118
A weight matrix of the first layer;
the above-mentioned adjacency matrix a takes the value, in particular, if there is a drug term from the drug term library updated based on synonym mining
Figure SMS_119
To->
Figure SMS_120
For (1), then->
Figure SMS_121
The value is 1, otherwise, the value is 0; if there is a term of external medicine from electronic medical record +.>
Figure SMS_122
To the drug terminology library based on synonym mining update +.>
Figure SMS_123
For (1), then->
Figure SMS_124
The value is the value of the similarity of the candidate association relations;
the output of the final layer L graph roll-up neural network model is used as node embedded representation of the drug terms in the drug term library based on synonym mining update and the external drug terms in the electronic medical record, and standard drug terms in each drug term library based on synonym mining update are obtained from the node embedded representation
Figure SMS_125
And the external drug term +_ in each electronic medical record>
Figure SMS_126
Is represented by the product of the node embedded representations of the two>
Figure SMS_127
And->
Figure SMS_128
The structural embedding representation of the association, noted +.>
Figure SMS_129
Adopting a marginal-based distance loss function optimization graph convolution neural network model, wherein a loss function formula is as follows:
Figure SMS_130
Wherein the method comprises the steps of
Figure SMS_131
Standard drug term +_representing drug term library based on synonym mining update>
Figure SMS_132
And the external drug term +_ in each electronic medical record>
Figure SMS_133
Distance function of the structure embedded representation of (2)>
Figure SMS_134
For indicating the super parameter of the marginal value distinguishing positive and negative samples, ++>
Figure SMS_135
Respectively representing positive and negative sample sets; the distance function of the structure embedded representation used in this embodiment is Euclidean distance, i.e.>
Figure SMS_136
In this embodiment, get +.>
Figure SMS_137
Step S33: embedding the semantic embedded representation output in step S31
Figure SMS_145
And the structure embedded representation outputted in step S32 +.>
Figure SMS_142
Spliced together, denoted->
Figure SMS_152
The representation is used as the input of a multi-layer perceptron, the multi-layer perceptron comprises a plurality of fully-connected hidden layers and a single-node output layer, and the output layer is represented as +.>
Figure SMS_143
The method comprises the steps of carrying out a first treatment on the surface of the The output vector of the multi-layer perceptron is treated by a nonlinear activation function +.>
Figure SMS_154
Is converted into scalar quantity, finally obtaining the external medicine term +.>
Figure SMS_146
Standard drug term +.2 in drug term library updated based on synonym mining>
Figure SMS_151
Associated probabilities of (a)
Figure SMS_144
As an output, a sigmoid activation function is employed in this embodiment, denoted as
Figure SMS_153
The method comprises the steps of carrying out a first treatment on the surface of the The loss function is represented as a two-class cross entropy loss function using the same as the BERT model
Figure SMS_138
Wherein->
Figure SMS_147
Is the external medicine term +. >
Figure SMS_140
Standard drug term +.2 in drug term library updated based on synonym mining>
Figure SMS_148
Tag of association or not, if->
Figure SMS_141
And->
Figure SMS_150
Associative presence +.>
Figure SMS_139
Otherwise->
Figure SMS_149
Step S4: and predicting and obtaining a correlation result of the external medicine terms in the electronic medical record and standard medicine terms in the medicine term library by using a correlation prediction model, and establishing the correlation of the external medicine terms in the real-world electronic medical record and the standard medicine terms in the medicine term library.
As shown in fig. 5, the present invention further provides an embodiment of a system for standardized association of drug terms in electronic medical records implemented based on the above method, where the system includes:
the electronic medical record drug term input module is used for acquiring all external drug terms to be subjected to drug term standardization in the electronic medical record;
the system comprises a drug term library synonym mining updating module, a database processing module and a database processing module, wherein the drug term library synonym mining updating module is used for constructing a corpus and acquiring a drug term list for synonym mining from the corpus; obtaining a synonym set of each standard drug term in the drug term library; training a synonym set classifier to obtain a classification prediction result of each drug term in the drug term list and the synonym set in the drug term library, obtaining all synonym sets based on synonym mining update according to a preset probability threshold, and updating the drug term library;
The candidate association relation establishing module is used for establishing candidate association relation between external medicine terms in the electronic medical record and the medicine terms in the updated medicine term library based on similarity calculation;
the semantic embedded representation module is used for inputting external medicine terms and pinyin character sequences thereof in the electronic medical record, standard medicine terms and pinyin character sequences thereof in the updated medicine term library, combining initial characters and separation characters to form a related medicine term pair character sequence, and inputting a pre-training language model to obtain semantic embedded representation;
the structure embedding representation module is used for respectively taking semantic embedding representations of external medicine terms and updated medicine terms in the medicine term library in the electronic medical record as initialized node embedding representations of the corresponding medicine terms, inputting a graph convolutional neural network model to obtain node embedding representations of the corresponding medicine terms, and taking the product of the node embedding representations of the external medicine terms and the standard medicine terms as the structure embedding representation;
the association prediction module is used for training an association prediction model based on semantic embedding and structural embedding according to the updated drug term library and external drug terms in the electronic medical record; and predicting and obtaining a correlation result of the external drug term and the standard drug term in the drug term library in the electronic medical record by using the correlation prediction model.
Corresponding to the embodiment of the method for standardizing and associating the drug terminology in the electronic medical record, the invention also provides an embodiment of the device for standardizing and associating the drug terminology in the electronic medical record. The device for standardizing and associating the medicine terms in the electronic medical record provided by the embodiment of the invention comprises a memory and one or more processors, wherein executable codes are stored in the memory, and the processors are used for realizing the method for standardizing and associating the medicine terms in the electronic medical record in the embodiment when executing the executable codes.
The embodiment of the invention also provides a computer readable storage medium, wherein a program is stored in the computer readable storage medium, and when the program is executed by a processor, the method for standardizing and associating the drug terms in the electronic medical record in the embodiment is realized.
The computer readable storage medium may be an internal storage unit, such as a hard disk or a memory, of any of the data processing enabled devices described in any of the previous embodiments. The computer readable storage medium may be any external storage device that has data processing capability, such as a plug-in hard disk, a Smart Media Card (SMC), an SD Card, a Flash memory Card (Flash Card), or the like, which are provided on the device. Further, the computer readable storage medium may include both internal storage units and external storage devices of any data processing device. The computer readable storage medium is used for storing the computer program and other programs and data required by the arbitrary data processing apparatus, and may also be used for temporarily storing data that has been output or is to be output.
The foregoing description of the preferred embodiment(s) is (are) merely intended to illustrate the embodiment(s) of the present invention, and it is not intended to limit the embodiment(s) of the present invention to the particular embodiment(s) described.

Claims (10)

1. The standardized association method for the drug terminology in the electronic medical record is characterized by comprising the following steps of:
s1, inputting a drug term library to obtain a synonym set of each standard drug term;
s2, obtaining a drug term library based on synonym mining update, comprising:
constructing a corpus used for synonym mining, and acquiring a drug term list from the corpus;
training a synonym set classifier to obtain a classification prediction result of each drug term in the drug term list and the synonym set in the drug term library, and obtaining all synonym sets based on synonym mining update according to a preset probability threshold;
updating the drug term library according to all synonym sets based on synonym mining update;
s3, training a correlation prediction model based on semantic embedding and structural embedding according to the updated drug term library and external drug terms in the electronic medical record, wherein the method comprises the following steps:
The semantic embedded representation of standard drug term pairs in an external drug term and updated drug term library in the electronic medical record is obtained through a pre-training language model, and specifically comprises the following steps: the external medicine terms and their pinyin character sequences, the standard medicine terms and their pinyin character sequences are combined with the initial characters and the separation characters to form the related medicine term pair character sequences, and the related medicine term pair character sequences are input into a pre-training language model to obtain semantic embedded representations;
obtaining the structure embedded representation of the external medicine term and the standard medicine term pair in the updated medicine term library in the electronic medical record through a graph convolution neural network model, wherein the structure embedded representation specifically comprises the following steps: establishing candidate association relation between the external medicine term and the medicine term in the updated medicine term library based on similarity calculation, respectively taking semantic embedded representations of the external medicine term and the medicine term in the updated medicine term library as initialized node embedded representations of corresponding medicine terms, inputting a graph convolutional neural network model to obtain node embedded representations of the corresponding medicine terms, and taking the product of the node embedded representations of the external medicine term and the standard medicine term as a structure embedded representation;
S4, predicting and obtaining a correlation result of the external drug term and the standard drug term in the drug term library in the electronic medical record by using the correlation prediction model.
2. The method for standardized association of drug terms in electronic medical records according to claim 1, wherein in the training process of the synonym set classifier, the probability that the drug terms to be categorized belong to the synonym set is predicted based on the change of the set uniformity score, and the method for calculating the set uniformity score is as follows: calculating an embedded representation of each term in the set, inputting the embedded representation into the fully-connected neural network model to obtain a new term representation, summing all the new term representations to obtain an initialized term set representation, and inputting the initialized term set representation into the fully-connected neural network model to obtain a set uniformity score.
3. The method for standardized association of drug terms in electronic medical records according to claim 1, wherein the training set generation mode of the synonym set classifier comprises: extracting a drug term from the synonym set in a random extraction mode, and generating a positive training sample by combining a set formed by the rest drug terms in the synonym set; for each positive training sample, matching a plurality of negative training samples, the negative training samples being extracted from a drug term library after the drug terms in the synonym set are excluded.
4. The method for standardized association of drug terms in electronic medical records according to claim 1, wherein the updating of the drug term library according to all synonym sets based on synonym mining update is specifically: if the synonym set of the standard drug term as the upper language is updated, establishing synonym association between the corresponding synonym and the standard drug term as the upper language, and simultaneously establishing lower language association between the corresponding synonym and all the standard drug terms associated with the standard drug term as the upper language; if the synonym set of the standard medicine terms of the non-upper languages is updated, corresponding synonyms are associated with the standard medicine terms of the non-upper languages.
5. The method for standardized association of drug terminology in electronic medical records according to claim 1, wherein the pre-training language model is adjusted, specifically: the semantic embedded representation of the initial character is used as an independent variable, and the dependent variable is a label of whether the semantic association of the external medicine term in the electronic medical record and the medicine term in the updated medicine term library is carried out; acquiring a prediction result based on semantic embedded representation by adopting a nonlinear activation function; and (3) acquiring a positive training sample by adopting a manual labeling mode, acquiring a negative training sample by adopting a random extraction mode, acquiring a training set, training a pre-training language model, and optimizing and adjusting semantic embedding representation.
6. The method for standardized association of drug terms in electronic medical records according to claim 1, wherein the candidate association relationship is established between the external drug terms and the drug terms in the updated drug term library, specifically: calculating the TF-IDF value of each word in the medicine term, obtaining vector representation of the external medicine term in the electronic medical record and each medicine term in the updated medicine term library, calculating the similarity between the two medicine terms, and if the similarity is larger than a preset similarity threshold, establishing a candidate association relationship between the external medicine term in the electronic medical record and the medicine term in the corresponding medicine term library.
7. The method for standardized association of drug terms in electronic medical records according to claim 1, wherein the inputs of each layer in the graph rolling neural network model comprise two parts, the first part is a node embedded representation matrix, the second part is an adjacent matrix, the output of each layer is used as the node embedded representation matrix of the next layer, the output is obtained through normalized graph laplace transformation, and the graph rolling neural network model is optimized by adopting a distance loss function based on a marginal.
8. The method for standardized association of drug terminology in electronic medical records according to claim 7, wherein the values of the adjacency matrix are specifically: if there is an edge from one drug term to another in the updated drug term library, the corresponding value is 1, otherwise the value is 0; if there is an edge from the external drug term in the electronic medical record to the updated drug term in the drug term library, the corresponding value is the similarity value in the candidate association.
9. The standardized association method of drug terms in electronic medical records according to claim 1, wherein semantic embedded representations and structural embedded representations are spliced, the spliced representations are input into a multi-layer perceptron, the multi-layer perceptron comprises a plurality of fully-connected hidden layers and a single-node output layer, and the output of the multi-layer perceptron is converted into scalar quantities through a nonlinear activation function to obtain the association probability of external drug terms in each electronic medical record and standard drug terms in an updated drug term library.
10. A system for standardizing association of drug terminology in electronic medical records implemented based on the method of any one of claims 1-9, comprising:
the electronic medical record drug term input module is used for acquiring all external drug terms to be subjected to drug term standardization in the electronic medical record;
the system comprises a drug term library synonym mining updating module, a database processing module and a database processing module, wherein the drug term library synonym mining updating module is used for constructing a corpus and acquiring a drug term list for synonym mining from the corpus; obtaining a synonym set of each standard drug term in the drug term library; training a synonym set classifier to obtain a classification prediction result of each drug term in the drug term list and the synonym set in the drug term library, obtaining all synonym sets based on synonym mining update according to a preset probability threshold, and updating the drug term library;
The candidate association relation establishing module is used for establishing candidate association relation between external medicine terms in the electronic medical record and the medicine terms in the updated medicine term library based on similarity calculation;
the semantic embedded representation module is used for inputting external medicine terms and pinyin character sequences thereof in the electronic medical record, standard medicine terms and pinyin character sequences thereof in the updated medicine term library, combining initial characters and separation characters to form a related medicine term pair character sequence, and inputting a pre-training language model to obtain semantic embedded representation;
the structure embedding representation module is used for respectively taking semantic embedding representations of external medicine terms and updated medicine terms in the medicine term library in the electronic medical record as initialized node embedding representations of the corresponding medicine terms, inputting a graph convolutional neural network model to obtain node embedding representations of the corresponding medicine terms, and taking the product of the node embedding representations of the external medicine terms and the standard medicine terms as the structure embedding representation;
the association prediction module is used for training an association prediction model based on semantic embedding and structural embedding according to the updated drug term library and external drug terms in the electronic medical record; and predicting and obtaining a correlation result of the external drug term and the standard drug term in the drug term library in the electronic medical record by using the correlation prediction model.
CN202310567874.4A 2023-05-19 2023-05-19 Method and system for standardized association of drug terms in electronic medical records Active CN116312915B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310567874.4A CN116312915B (en) 2023-05-19 2023-05-19 Method and system for standardized association of drug terms in electronic medical records

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310567874.4A CN116312915B (en) 2023-05-19 2023-05-19 Method and system for standardized association of drug terms in electronic medical records

Publications (2)

Publication Number Publication Date
CN116312915A true CN116312915A (en) 2023-06-23
CN116312915B CN116312915B (en) 2023-09-19

Family

ID=86781981

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310567874.4A Active CN116312915B (en) 2023-05-19 2023-05-19 Method and system for standardized association of drug terms in electronic medical records

Country Status (1)

Country Link
CN (1) CN116312915B (en)

Citations (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103544383A (en) * 2013-10-10 2014-01-29 中国中医科学院 Standard-term-based fast EMR (electronic medical record) entry system
US20140149103A1 (en) * 2010-05-26 2014-05-29 Warren Daniel Child Modular system and method for managing chinese, japanese, and korean linguistic data in electronic form
US9436760B1 (en) * 2016-02-05 2016-09-06 Quid, Inc. Measuring accuracy of semantic graphs with exogenous datasets
US20170024461A1 (en) * 2015-07-23 2017-01-26 International Business Machines Corporation Context sensitive query expansion
CN106383853A (en) * 2016-08-30 2017-02-08 刘勇 Realization method and system for electronic medical record post-structuring and auxiliary diagnosis
CN106897568A (en) * 2017-02-28 2017-06-27 北京大数医达科技有限公司 The treating method and apparatus of case history structuring
CN107665190A (en) * 2017-09-29 2018-02-06 李晓妮 A kind of method for automatically constructing and device of text proofreading mistake dictionary
CN111460175A (en) * 2020-04-08 2020-07-28 福州数据技术研究院有限公司 SNOMED-CT-based medical noun dictionary construction and expansion method
KR20200097949A (en) * 2019-02-11 2020-08-20 네이버 주식회사 Method and system for extracting synonym by using keyword relation structure
CN111986759A (en) * 2020-08-31 2020-11-24 平安医疗健康管理股份有限公司 Method and system for analyzing electronic medical record, computer equipment and readable storage medium
CN113657109A (en) * 2021-08-31 2021-11-16 平安医疗健康管理股份有限公司 Method, apparatus and computer device for standardization of model-based clinical terminology
CN114091425A (en) * 2021-11-25 2022-02-25 北京富通东方科技有限公司 Medical entity alignment method and device
US20220108188A1 (en) * 2020-10-01 2022-04-07 International Business Machines Corporation Querying knowledge graphs with sub-graph matching networks
CN114417809A (en) * 2021-12-27 2022-04-29 北京滴普科技有限公司 Entity alignment method based on combination of graph structure information and text semantic model
WO2022088672A1 (en) * 2020-10-29 2022-05-05 平安科技(深圳)有限公司 Machine reading comprehension method and apparatus based on bert, and device and storage medium
CN114444501A (en) * 2022-01-24 2022-05-06 荃豆数字科技有限公司 Method and device for searching traditional Chinese medicine decoction pieces, electronic equipment and storage medium
CN115374792A (en) * 2022-09-14 2022-11-22 山东省计算中心(国家超级计算济南中心) Policy text labeling method and system combining pre-training and graph neural network
WO2023065858A1 (en) * 2021-10-19 2023-04-27 之江实验室 Medical term standardization system and method based on heterogeneous graph neural network

Patent Citations (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140149103A1 (en) * 2010-05-26 2014-05-29 Warren Daniel Child Modular system and method for managing chinese, japanese, and korean linguistic data in electronic form
CN103544383A (en) * 2013-10-10 2014-01-29 中国中医科学院 Standard-term-based fast EMR (electronic medical record) entry system
US20170024461A1 (en) * 2015-07-23 2017-01-26 International Business Machines Corporation Context sensitive query expansion
US9436760B1 (en) * 2016-02-05 2016-09-06 Quid, Inc. Measuring accuracy of semantic graphs with exogenous datasets
CN106383853A (en) * 2016-08-30 2017-02-08 刘勇 Realization method and system for electronic medical record post-structuring and auxiliary diagnosis
CN106897568A (en) * 2017-02-28 2017-06-27 北京大数医达科技有限公司 The treating method and apparatus of case history structuring
CN107665190A (en) * 2017-09-29 2018-02-06 李晓妮 A kind of method for automatically constructing and device of text proofreading mistake dictionary
KR20200097949A (en) * 2019-02-11 2020-08-20 네이버 주식회사 Method and system for extracting synonym by using keyword relation structure
CN111460175A (en) * 2020-04-08 2020-07-28 福州数据技术研究院有限公司 SNOMED-CT-based medical noun dictionary construction and expansion method
CN111986759A (en) * 2020-08-31 2020-11-24 平安医疗健康管理股份有限公司 Method and system for analyzing electronic medical record, computer equipment and readable storage medium
US20220108188A1 (en) * 2020-10-01 2022-04-07 International Business Machines Corporation Querying knowledge graphs with sub-graph matching networks
WO2022088672A1 (en) * 2020-10-29 2022-05-05 平安科技(深圳)有限公司 Machine reading comprehension method and apparatus based on bert, and device and storage medium
CN113657109A (en) * 2021-08-31 2021-11-16 平安医疗健康管理股份有限公司 Method, apparatus and computer device for standardization of model-based clinical terminology
WO2023065858A1 (en) * 2021-10-19 2023-04-27 之江实验室 Medical term standardization system and method based on heterogeneous graph neural network
CN114091425A (en) * 2021-11-25 2022-02-25 北京富通东方科技有限公司 Medical entity alignment method and device
CN114417809A (en) * 2021-12-27 2022-04-29 北京滴普科技有限公司 Entity alignment method based on combination of graph structure information and text semantic model
CN114444501A (en) * 2022-01-24 2022-05-06 荃豆数字科技有限公司 Method and device for searching traditional Chinese medicine decoction pieces, electronic equipment and storage medium
CN115374792A (en) * 2022-09-14 2022-11-22 山东省计算中心(国家超级计算济南中心) Policy text labeling method and system combining pre-training and graph neural network

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
SHWETA TANEJA 等: "A Text Preprocessing Approach for Efficacious Information Retrieval", 《SMART INNOVATIONS IN COMMUNICATION AND COMPUTATIONAL SCIENCES》, vol. 669, pages 13 *
张健;冯飞;刘宇;马红烨;: "基于本体概念相似度的网页排序算法研究", 情报学报, no. 11, pages 56 - 65 *
赵蒙月: "基于语料库对比的英语母语者有标转折复句习得研究", 《中国优秀硕士学位论文全文数据库 哲学与人文科学辑》, no. 11, pages 084 - 699 *

Also Published As

Publication number Publication date
CN116312915B (en) 2023-09-19

Similar Documents

Publication Publication Date Title
CN112214995B (en) Hierarchical multitasking term embedded learning for synonym prediction
CN110210037B (en) Syndrome-oriented medical field category detection method
CN111382272B (en) Electronic medical record ICD automatic coding method based on knowledge graph
CN110459287B (en) Structured report data from medical text reports
CN111540468B (en) ICD automatic coding method and system for visualizing diagnostic reasons
CN111834014A (en) Medical field named entity identification method and system
CN111274790B (en) Chapter-level event embedding method and device based on syntactic dependency graph
CN112232065B (en) Method and device for mining synonyms
CN111858940B (en) Multi-head attention-based legal case similarity calculation method and system
WO2017193685A1 (en) Method and device for data processing in social network
CN111950283B (en) Chinese word segmentation and named entity recognition system for large-scale medical text mining
CN111881292B (en) Text classification method and device
CN113378970B (en) Sentence similarity detection method and device, electronic equipment and storage medium
CN113707339B (en) Method and system for concept alignment and content inter-translation among multi-source heterogeneous databases
CN113707299A (en) Auxiliary diagnosis method and device based on inquiry session and computer equipment
CN111582506A (en) Multi-label learning method based on global and local label relation
CN115858886B (en) Data processing method, device, equipment and readable storage medium
CN115293161A (en) Reasonable medicine taking system and method based on natural language processing and medicine knowledge graph
CN113657105A (en) Medical entity extraction method, device, equipment and medium based on vocabulary enhancement
CN111597330A (en) Intelligent expert recommendation-oriented user image drawing method based on support vector machine
CN112861538A (en) Entity linking method based on context semantic relation and document consistency constraint
CN111782818A (en) Device, method and system for constructing biomedical knowledge graph and memory
CN116956228A (en) Text mining method for technical transaction platform
CN116630062A (en) Medical insurance fraud detection method, system and storage medium
CN116312915B (en) Method and system for standardized association of drug terms in electronic medical records

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant