CN114238639A - Construction method and device of medical term standardized framework and electronic equipment - Google Patents

Construction method and device of medical term standardized framework and electronic equipment Download PDF

Info

Publication number
CN114238639A
CN114238639A CN202111574525.2A CN202111574525A CN114238639A CN 114238639 A CN114238639 A CN 114238639A CN 202111574525 A CN202111574525 A CN 202111574525A CN 114238639 A CN114238639 A CN 114238639A
Authority
CN
China
Prior art keywords
term
medical
terms
model
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111574525.2A
Other languages
Chinese (zh)
Inventor
罗立刚
张旸
马睿
刘辉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zero Krypton Medical Intelligent Technology Guangzhou Co ltd
Original Assignee
Zero Krypton Medical Intelligent Technology Guangzhou Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zero Krypton Medical Intelligent Technology Guangzhou Co ltd filed Critical Zero Krypton Medical Intelligent Technology Guangzhou Co ltd
Priority to CN202111574525.2A priority Critical patent/CN114238639A/en
Publication of CN114238639A publication Critical patent/CN114238639A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/31Indexing; Data structures therefor; Storage structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning

Abstract

The embodiment of the application provides a construction method, a construction device, electronic equipment and a storage medium of a medical term standardization framework, wherein the method comprises the following steps: acquiring raw data of medical terms; classifying the original data of the medical terms to obtain short term data and long term data; establishing a synonym library corresponding to the short term data; establishing a variant rule base corresponding to the short term data; establishing a recall model and a sequencing model according to the long term data; and constructing a medical term standardization framework according to the synonym library, the variant rule library, the recall model and the sequencing model. By implementing the embodiment of the application, the medical terms can be converted into the standardized medical terms, and the conversion accuracy of the medical terms is improved through iterative closed loop.

Description

Construction method and device of medical term standardized framework and electronic equipment
Technical Field
The application relates to the technical field of medical term information processing, in particular to a method and a device for constructing a medical term standardization framework, an electronic device and a computer-readable storage medium.
Background
The traditional medical term standardization usually adopts a method based on word list mapping, a synonym library corresponding to standard terms is established in advance, and the standard terms are obtained by inquiring synonym marks in a standardization stage; or using a model method for standardization, namely adopting a typical recall + sorting method for standardization, obtaining candidates from a standardized word bank according to similarity measurement in an inference stage, and then sorting the candidate words to obtain the best candidate words serving as standard words.
However, both methods have certain disadvantages, the method based on word list mapping needs to collect a large number of synonyms in the early stage, the standardization effect is completely determined by the coverage of the synonyms, for some terms with longer names, due to the diversity of writing expression, exhaustion is difficult to be completed through previous synonym collection, the expression effect is poor, the conversion accuracy is low, the method based on recall + sorting needs a large number of data labels in the early stage, and the effect cannot be well guaranteed in some complex scenes.
Disclosure of Invention
An object of the embodiments of the present application is to provide a method, an apparatus, an electronic device, and a computer-readable storage medium for constructing a medical term standardization framework, which can convert a medical term into a standardized medical term, so that the conversion accuracy of the medical term is improved.
In a first aspect, an embodiment of the present application provides a method for constructing a medical term standardization framework, where the method includes:
acquiring raw data of medical terms;
classifying the original data of the medical terms to obtain short term class data and long term class data;
establishing a synonym library corresponding to the short term data;
establishing a variant rule base corresponding to the short term class data;
establishing a recall model and a sequencing model according to the long term data;
constructing a medical term standardization framework according to the synonym library, the variant rule library, the recall model and the sequencing model.
In the implementation process, different standardization rules are established for different medical term types, so that different types of medical terms can be standardized and corrected by a proper method, each type of medical term can generate a corresponding standardized medical term, and the identification accuracy of the medical terms is improved.
Further, the step of establishing a thesaurus corresponding to the short term class data includes:
acquiring short terms in the short term data;
and extracting synonym word frequency information corresponding to the short term, and establishing the synonym library according to the synonym word frequency information.
In the implementation process, the synonym library is established through the synonym frequency information corresponding to the short terms, so that each short term can find the corresponding standardized short term in the synonym library, the short term is convenient to correct, and the accuracy is improved.
Further, the step of establishing the synonym library according to the synonym word frequency information includes:
and performing word list mapping on the short terms according to the synonym word frequency information to obtain the synonym library.
In the implementation process, the synonym word frequency information can reflect the characteristics of the short term words, and the synonym word frequency information is used as a confidence basis, so that the accuracy rate of converting the short term words into the standardized short term words can be improved.
Further, the step of establishing a variant rule base corresponding to the short term class data includes:
acquiring a variant rule;
and performing variant error correction on the short terms according to the variant rule to obtain a variant rule base.
In the implementation process, the variant error correction can convert the short terms into labeled short terms, and each short term can express real medical information.
Further, the step of establishing a recall model and a ranking model according to the long term class data includes:
acquiring a standard term library and long terms in the long term class data;
recalling and modeling the long term according to the standard term library to obtain a recall model;
matching the long term with the standard term in the standard term library according to the recall model to obtain a candidate standard term;
and performing sequencing modeling according to the candidate standard terms and the standard terms corresponding to the long terms to obtain a sequencing model.
In the implementation process, the recall model and the sequencing model can convert long terms into standardized long terms, so that the complexity of manual conversion is reduced, resources are saved, the labor cost is reduced, and the accuracy of standardized conversion of the long terms is improved.
Further, the step of performing recall modeling on the long term according to the standard term base to obtain a recall model includes:
pairing the long term with a standard term in the standard term library;
inputting the long term and the standard term after matching into a machine learning model for training to obtain a recall model.
In the implementation process, the matching of the long term with the standard term in the standard term library can improve the accuracy of the recall model for converting the long term, so that the recall model can be more suitable for the standardized conversion characteristic of the medical long term.
Further, after the step of performing ranking modeling according to the candidate standard term and the standard term corresponding to the long term to obtain a ranking model, the method further includes:
and sequencing the candidate standard terms according to the sequencing model to obtain the sequenced candidate standard terms.
In the implementation process, the sorted candidate standard terms can be obtained, so that the selection of the standardized long terms is facilitated.
In a second aspect, the present application provides an apparatus for constructing a medical term standardization frame, where the apparatus includes:
the acquisition module is used for acquiring original data of medical terms;
the classification module is used for classifying the original data of the medical terms to obtain short term data and long term data;
the synonym library establishing module is used for establishing a synonym library corresponding to the short term data;
the variant rule base establishing module is used for establishing a variant rule base corresponding to the short term class data;
the model establishing module is used for establishing a recall model and a sequencing model according to the long term data;
and the framework construction module is used for constructing a medical term standardization framework according to the synonym library, the variant rule library, the recall model and the sequencing model.
In the implementation process, different standardization rules are established for different medical term types, so that different types of medical terms can be standardized and corrected by a proper method, and each type of medical term can generate a corresponding standardized medical term, so that the recognition rate of the medical terms is greatly improved.
In a third aspect, an electronic device provided in an embodiment of the present application includes: memory, a processor and a computer program stored in the memory and executable on the processor, the processor implementing the steps of the method according to any of the first aspect when executing the computer program.
In a fourth aspect, an embodiment of the present application provides a computer-readable storage medium having instructions stored thereon, which, when executed on a computer, cause the computer to perform the method according to any one of the first aspect.
In a fifth aspect, embodiments of the present application provide a computer program product, which when run on a computer, causes the computer to perform the method according to any one of the first aspect.
Additional features and advantages of the disclosure will be set forth in the description which follows, or in part may be learned by the practice of the above-described techniques of the disclosure, or may be learned by practice of the disclosure.
The present invention can be implemented in accordance with the content of the specification, and the following detailed description of the preferred embodiments of the present application is made with reference to the accompanying drawings.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are required to be used in the embodiments of the present application will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present application and therefore should not be considered as limiting the scope, and that those skilled in the art can also obtain other related drawings based on the drawings without inventive efforts.
Fig. 1 is a schematic flow chart of a method for constructing a medical term standardization framework provided in an embodiment of the present application;
FIG. 2 is a schematic structural component diagram of a construction device of a medical term standardization framework provided in an embodiment of the present application;
fig. 3 is a schematic structural component diagram of an electronic device according to an embodiment of the present application.
Detailed Description
The technical solutions in the embodiments of the present application will be described below with reference to the drawings in the embodiments of the present application.
It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, it need not be further defined and explained in subsequent figures. Meanwhile, in the description of the present application, the terms "first", "second", and the like are used only for distinguishing the description, and are not to be construed as indicating or implying relative importance.
The following detailed description of embodiments of the present application will be described in conjunction with the accompanying drawings and examples. The following examples are intended to illustrate the present application but are not intended to limit the scope of the present application.
Example one
Fig. 1 is a schematic flow chart of a method for constructing a medical term standardization framework provided in an embodiment of the present application, and as shown in fig. 1, the method includes:
s1, acquiring original data of the medical terms;
s2, classifying the original data of the medical terms to obtain short term data and long term data;
s3, establishing a synonym library corresponding to the short term data;
s4, establishing a variant rule base corresponding to the short term data;
s5, establishing a recall model and a sequencing model according to the long term data;
s6, constructing a medical term standardization framework according to the synonym library, the variant rule library, the recall model and the sequencing model.
In the implementation process, different standardization rules are established for different medical term types, so that different types of medical terms can be standardized and corrected by a proper method, and each type of medical term can generate a corresponding standardized medical term, so that the recognition rate of the medical terms is greatly improved.
The term "medical" is a term of art used in the medical field to refer to various things, phenomena, characteristics, relationships, processes, etc. (e.g., diseases, drugs, surgical operations, examination, etc.) in the medical field. These terms are essential components of clinical information systems expressing medical information. In the embodiment of the application, the raw data of the medical terms is data containing various medical terms, and can be downloaded from a medical database, or collected and summarized from various medical record data of a hospital, when the raw data of the medical terms is not subjected to data standardization processing, the raw data of the medical terms contains a plurality of nonstandard data, such as medical aliases, synonyms and the like, which cannot be unified, and the raw data of the medical terms which are not subjected to the standardization processing are difficult to be applied by subsequent medical, so that the data waste is caused.
Further, the original data of medical terms are divided into short term class data and long term class data, the short term class data includes data in the form of words and phrases in medical terms, and mostly names of diseases and medicines, and the long term class data is data presented in the form of sentences in medical terms, and mostly data on diagnosis, examination and the like of diseases. The short term class data and the long term class data are characterized differently and are therefore normalized differently to ensure the highest accuracy of the resulting normalized medical terms.
Further, S3 includes:
acquiring short terms in the short term data;
and extracting synonym frequency information corresponding to the short terms, and establishing a synonym library according to the synonym frequency information.
In the implementation process, the synonym library is established through the synonym frequency information corresponding to the short terms, so that each short term can find the corresponding standardized short term in the synonym library, the short term is convenient to correct, and the accuracy can be effectively improved.
Further, the step of establishing a synonym library according to the synonym frequency information comprises the following steps:
and performing word list mapping on the short terms according to the synonym word frequency information to obtain a synonym library.
In clinical medicine, the same disease and different drugs appear due to the complexity of medical diagnosis and individual difference. In the embodiment of the application, when the similarity between medical terms is combed, the information which may cause the difference is fully considered, so that the synonym frequency information is selected as a feature for measuring the similarity between short terms, and among a plurality of features which can measure the similarity of short terms, a feature which does not cause the difference is selected as a standard, for example: a manner of expression of the medical term, a standardized term corresponding to the medical term, and the like. For example, in order to more accurately determine the character similarity between the raw data of medical terms and the standardized medical terms, the similarity of short terms may be determined according to the expression manner of the raw data of medical terms and the standardized terms corresponding to the medical terms, respectively.
Illustratively, the standardized medical term corresponding to the short term "tuberculous pneumonia" or "tuberculosis" is "tuberculosis", that is, when such a short term appears, the expressed accurate medical feature is tuberculosis, and "tuberculous pneumonia", "tuberculosis" and "tuberculosis" are synonyms, the synonym frequency information of the short term "tuberculous pneumonia" and "tuberculosis" is extracted, the synonym frequency information is lexical information containing the main features in the short term, the synonym of the standardized term corresponding to the short term can be judged through the synonym frequency information, the synonym frequency information is used as confidence basis for word list mapping, so that each short term can find the synonym of the corresponding standardized term, and a synonym library is obtained.
In the implementation process, the synonym word frequency information can reflect the characteristics of the short term words, and the synonym word frequency information is used as a confidence basis, so that the accuracy rate of converting the short term words into the standardized short term words can be improved.
Further, S4 includes:
acquiring a variant rule;
and performing variant error correction on the short terms according to the variant rules to obtain a variant rule base.
In the implementation process, the variant error correction can convert the short terms into labeled short terms, and each short term can express real medical information.
Illustratively, due to different expressions, errors can be caused in the expression process of various short terms, for example, the medicine name "aspirin" can be mistaken as "aspirin" in the expression process, variant error correction is carried out on the aspirin, the short term "aspirin" can be converted into the standardized short term "aspirin", and real information is expressed.
The variant rules include: the initial consonant variant, the homophone variant, the font variant and other rules, and each short term is converted into a corresponding standardized short term according to the variant rule through standardized conversion, as shown in table 1, the standardized short terms are corresponding to several different short terms after error correction of the variant rule.
TABLE 1 variant error corrected conversion tables for short terms
Figure BDA0003424818790000081
Figure BDA0003424818790000091
Further, S5 includes:
acquiring long terms in a standard term library and long term class data;
recalling and modeling the long term according to a standard term library to obtain a recall model;
matching the long term with the standard term in the standard term library according to the recall model to obtain a candidate standard term;
and performing sequencing modeling according to the candidate standard terms and the standard terms corresponding to the long terms to obtain a sequencing model.
In the implementation process, the recall model and the sequencing model can convert the long terms into the standardized long terms, so that the complexity of manual conversion is reduced, manpower and material resources are saved, and the accuracy of the standardized conversion of the long terms is improved.
Further, the step of recalling and modeling the long term according to the standard term library to obtain a recall model comprises the following steps:
matching the long term with the standard term in the standard term library;
inputting the matched long term and standard term into a machine learning model for training to obtain a recall model.
For long terms in medicine, because information to be expressed is complex, the variant rule base and the synonym base cannot realize standardized conversion of the long terms, and a recall model needs to be constructed.
Further, after the step of performing ranking modeling according to the candidate standard term and the standard term corresponding to the long term to obtain a ranking model, the method further includes:
and sequencing the candidate standard terms according to the sequencing model to obtain the sequenced candidate standard terms.
The recall model and the ranking model require multiple standardized training of long terms to improve the standardized conversion rate of the long terms, optionally, each time the long terms are recalled in a candidate manner according to the standard term base to obtain candidate standard terms, and at the same time, the candidate standard terms are ranked in similarity, the similarity is the similarity between the candidate standard terms and the long terms in the standardized term base, the ranking can obtain the long term closest to the standardized term base, i.e. the long term with the highest similarity, optionally, the candidate standard term with TOP-K (K1, 2, 3.) in the candidate terms can be selected for training, TOP-K is one or more candidate standard terms with the highest similarity in the ranked candidate standard terms, so that the trained ranking model better conforms to the characteristics of the long terms, and the standardization result is improved, the selected candidate standard terms of TOP-K are identified to ensure the accuracy of each candidate term, and to ensure that each candidate term is the closest candidate term to the standardized term library.
Illustratively, when K ═ 2, the long term and the corresponding standardized long term in the standard term library are: the right adrenal giant tumor resection and the adrenal lesion resection, and the selected candidate standard terms of TOP-2 are unilateral adrenal resection and adrenal gland partial resection; the long term: silicone oil removal from the right eye combined with intraocular lens phase II implantation, standardized long terminology: vitreous silicone oil extraction and intraocular artificial lens secondary implantation, and the candidate standard terms of the selected TOP-2 are vitreous puncture drawing and intraocular artificial lens secondary implantation.
In the implementation process, the accuracy of converting long terms of the recall model and the ranking model can be improved by selecting candidate standard terms of TOP-K, so that the recall model and the ranking model can be more suitable for the standardized conversion characteristics of medical long terms.
In the embodiment of the application, a medical term standardization framework is constructed by recalling the model and the sequencing model and combining with a manual confirmation link, and an iterative closed loop of medical term standardization is supported. Non-standardized medical terms can be converted into standardized medical terms, and the conversion accuracy of the medical terms is improved through iterative closed loop.
Further, before the step of acquiring the raw data of the medical term, the method further comprises:
and cleaning the medical term raw data to remove the symbolic data in the medical term raw data.
In the implementation process, the original data of the medical terms are cleaned in advance to remove the symbolic data, so that errors caused by the symbolic data in the standardized conversion of the medical terms are avoided, and the efficiency of converting the medical terms into the standardized medical terms is improved.
Specifically, the raw data of the medical terms, which needs to be standardized, in the medical terms can be obtained, the raw data of the medical terms is cleaned to finish operations of unifying case and case, removing spaces and special symbols, removing words without actual business meaning, processing negative words and the like, and the cleaned raw data of the medical terms is standardized.
Example two
In order to perform a corresponding method of the above-described embodiments to achieve corresponding functions and technical effects, there is provided a construction apparatus of a medical term standardization framework, as shown in fig. 2, the apparatus including:
an acquisition module 1, configured to acquire raw data of medical terms;
the classification module 2 is used for classifying the original data of the medical terms to obtain short term data and long term data;
a synonym library establishing module 3, which is used for establishing a synonym library corresponding to the short term data;
a variant rule base establishing module 4, configured to establish a variant rule base corresponding to the short term class data;
the model building module 5 is used for building a recall model and a sequencing model according to the long term data;
and the framework construction module 6 is used for constructing a medical term standardization framework according to the synonym library, the variant rule library, the recall model and the sequencing model.
In the implementation process, different standardization rules are established for different medical term types, so that different types of medical terms can be standardized and corrected by a proper method, and each type of medical term can generate a corresponding standardized medical term, so that the recognition rate of the medical terms is greatly improved.
Further, the thesaurus establishing module 3 is further configured to:
acquiring short terms in the short term data;
extracting synonym frequency information corresponding to the short terms, and establishing a synonym library according to the synonym frequency information;
further, the thesaurus establishing module 3 is further configured to:
and performing word list mapping on the short terms according to the synonym word frequency information to obtain a synonym library.
Further, the variant rule base establishing module 4 is further configured to:
acquiring a variant rule;
and performing variant error correction on the short terms according to the variant rules to obtain a variant rule base.
Further, the model building module 5:
acquiring long terms in a standard term library and long term class data;
recalling and modeling the long term according to a standard term library to obtain a recall model;
matching the long term with the standard term in the standard term library according to the recall model to obtain a candidate standard term;
and performing sequencing modeling according to the candidate standard terms and the standard terms corresponding to the long terms to obtain a sequencing model.
Further, the model building module 5 is further configured to:
matching the long term with the standard term in the standard term library;
inputting the matched long term and standard term into a machine learning model for training to obtain a recall model.
Further, the model building module 5 is further configured to:
and sequencing the candidate standard terms according to the sequencing model to obtain the sequenced candidate standard terms.
The above-mentioned construction device of the medical term standardization framework can implement the method of the first embodiment. The alternatives in the first embodiment are also applicable to the present embodiment, and are not described in detail here.
The rest of the embodiments of the present application may refer to the contents of the first embodiment, and in this embodiment, details are not repeated.
EXAMPLE III
The embodiment of the present application provides an electronic device, which includes a memory and a processor, where the memory is used for storing a computer program, and the processor runs the computer program to make the electronic device execute the method for constructing the medical term standardization framework of the first embodiment.
Alternatively, the electronic device may be a server.
Referring to fig. 3, fig. 3 is a schematic structural composition diagram of an electronic device according to an embodiment of the present disclosure. The electronic device may include a processor 31, a communication interface 32, a memory 33, and at least one communication bus 34. Wherein the communication bus 34 is used for realizing direct connection communication of these components. The communication interface 32 of the device in the embodiment of the present application is used for performing signaling or data communication with other node devices. The processor 31 may be an integrated circuit chip having signal processing capabilities.
The Processor 31 may be a general-purpose Processor, including a Central Processing Unit (CPU), a Network Processor (NP), and the like; but may also be a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf programmable gate array (FPGA) or other programmable logic device, discrete gate or transistor logic, discrete hardware components. The various methods, steps, and logic blocks disclosed in the embodiments of the present application may be implemented or performed. A general purpose processor may be a microprocessor or the processor 31 may be any conventional processor or the like.
The Memory 33 may be, but is not limited to, a Random Access Memory (RAM), a Read Only Memory (ROM), a Programmable Read-Only Memory (PROM), an Erasable Read-Only Memory (EPROM), an electrically Erasable Read-Only Memory (EEPROM), and the like. The memory 33 has stored therein computer readable instructions which, when executed by the processor 31, enable the apparatus to perform the various steps involved in the method embodiment of fig. 1 described above.
Optionally, the electronic device may further include a memory controller, an input output unit. The memory 33, the memory controller, the processor 31, the peripheral interface, and the input/output unit are electrically connected to each other directly or indirectly to realize data transmission or interaction. For example, these components may be electrically connected to each other via one or more communication buses 34. The processor 31 is adapted to execute executable modules stored in the memory 33, such as software functional modules or computer programs comprised by the device.
The input and output unit is used for providing a task for a user to create and start an optional time period or preset execution time for the task creation so as to realize the interaction between the user and the server. The input/output unit may be, but is not limited to, a mouse, a keyboard, and the like.
It will be appreciated that the configuration shown in fig. 3 is merely illustrative and that the electronic device may include more or fewer components than shown in fig. 3 or have a different configuration than shown in fig. 3. The components shown in fig. 3 may be implemented in hardware, software, or a combination thereof.
In addition, the present application further provides a computer-readable storage medium, which stores a computer program, and when the computer program is executed by a processor, the computer program implements the method for constructing the medical term standardization framework of the first embodiment.
Embodiments of the present application further provide a computer program product, which when running on a computer, causes the computer to execute the method described in the method embodiments.
In the embodiments provided in the present application, it should be understood that the disclosed apparatus and method can be implemented in other ways. The apparatus embodiments described above are merely illustrative, and for example, the flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of apparatus, methods and computer program products according to various embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
In addition, functional modules in the embodiments of the present application may be integrated together to form an independent part, or each module may exist separately, or two or more modules may be integrated to form an independent part.
The functions, if implemented in the form of software functional modules and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application or portions thereof that substantially contribute to the prior art may be embodied in the form of a software product stored in a storage medium and including instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: various media capable of storing program codes, such as a U disk, a removable hard disk, a ROM, a RAM, a magnetic disk, or an optical disk.
The above description is only an example of the present application and is not intended to limit the scope of the present application, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, improvement and the like made within the spirit and principle of the present application shall be included in the protection scope of the present application. It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, it need not be further defined and explained in subsequent figures.
The above description is only for the specific embodiments of the present application, but the scope of the present application is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present application, and shall be covered by the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.
It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.

Claims (10)

1. A method of constructing a medical term standardization framework, the method comprising:
acquiring raw data of medical terms;
classifying the original data of the medical terms to obtain short term class data and long term class data;
establishing a synonym library corresponding to the short term data;
establishing a variant rule base corresponding to the short term class data;
establishing a recall model and a sequencing model according to the long term data;
constructing a medical term standardization framework according to the synonym library, the variant rule library, the recall model and the sequencing model.
2. The method for constructing a medical term standardization framework according to claim 1, wherein the step of establishing a thesaurus corresponding to the short term class data comprises:
acquiring short terms in the short term data;
extracting synonym word frequency information corresponding to the short term;
and establishing the synonym library according to the synonym word frequency information.
3. The method for constructing a medical term standardization framework according to claim 2, wherein the step of establishing the synonym library according to the synonym word frequency information comprises:
and performing word list mapping on the short terms according to the synonym word frequency information to obtain the synonym library.
4. The method for constructing a medical term standardization framework according to claim 1, wherein the step of establishing a variant rule base corresponding to the short term class data comprises:
acquiring a variant rule;
and performing variant error correction on the short terms according to the variant rule to obtain the variant rule base.
5. The method for constructing a medical term standardization framework according to claim 1, wherein the step of building a recall model and a ranking model from the long term class data comprises:
acquiring a standard term library and long terms in the long term class data;
recalling and modeling the long term according to the standard term library to obtain a recall model;
matching the long term with the standard term in the standard term library according to the recall model to obtain a candidate standard term;
and performing sequencing modeling according to the candidate standard terms and the standard terms corresponding to the long terms to obtain a sequencing model.
6. The method according to claim 5, wherein the step of performing recall modeling on the long term according to the standard term base to obtain a recall model comprises:
pairing the long term with a standard term in the standard term library;
inputting the long term and the standard term after matching into a machine learning model for training to obtain a recall model.
7. The method for constructing a medical term standardization framework according to claim 5, wherein after the step of performing order modeling according to the candidate standard term and the standard term corresponding to the long term to obtain an order model, the method further comprises:
and sequencing the candidate standard terms according to the sequencing model to obtain the sequenced candidate standard terms.
8. An apparatus for constructing a medical term standardization frame, the apparatus comprising:
the acquisition module is used for acquiring original data of medical terms;
the classification module is used for classifying the original data of the medical terms to obtain short term data and long term data;
the synonym library establishing module is used for establishing a synonym library corresponding to the short term data;
the variant rule base establishing module is used for establishing a variant rule base corresponding to the short term class data;
the model establishing module is used for establishing a recall model and a sequencing model according to the long term data;
and the framework construction module is used for constructing a medical term standardization framework according to the synonym library, the variant rule library, the recall model and the sequencing model.
9. An electronic device, comprising a memory for storing a computer program and a processor for executing the computer program to cause the electronic device to perform the method of constructing a medical term standardization framework according to any one of claims 1 to 7.
10. A computer-readable storage medium, characterized in that it stores a computer program which, when being executed by a processor, implements the method of construction of a medical term standardization framework as defined in any one of claims 1 to 7.
CN202111574525.2A 2021-12-21 2021-12-21 Construction method and device of medical term standardized framework and electronic equipment Pending CN114238639A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111574525.2A CN114238639A (en) 2021-12-21 2021-12-21 Construction method and device of medical term standardized framework and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111574525.2A CN114238639A (en) 2021-12-21 2021-12-21 Construction method and device of medical term standardized framework and electronic equipment

Publications (1)

Publication Number Publication Date
CN114238639A true CN114238639A (en) 2022-03-25

Family

ID=80760646

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111574525.2A Pending CN114238639A (en) 2021-12-21 2021-12-21 Construction method and device of medical term standardized framework and electronic equipment

Country Status (1)

Country Link
CN (1) CN114238639A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116150382A (en) * 2023-04-19 2023-05-23 北京亚信数据有限公司 Method and device for determining standardized medical terms

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116150382A (en) * 2023-04-19 2023-05-23 北京亚信数据有限公司 Method and device for determining standardized medical terms

Similar Documents

Publication Publication Date Title
CN106919793B (en) Data standardization processing method and device for medical big data
CN109564589B (en) Entity identification and linking system and method using manual user feedback
CN107644011B (en) System and method for fine-grained medical entity extraction
CN112786194A (en) Medical image diagnosis guide inspection system, method and equipment based on artificial intelligence
CN112015917A (en) Data processing method and device based on knowledge graph and computer equipment
CN107408156A (en) For carrying out semantic search and the system and method for extracting related notion from clinical document
CN111061841A (en) Knowledge graph construction method and device
US11537788B2 (en) Methods, systems, and storage media for automatically identifying relevant chemical compounds in patent documents
CN112541066A (en) Text-structured-based medical and technical report detection method and related equipment
JP2019032704A (en) Table data structuring system and table data structuring method
Zuccon et al. De-identification of health records using Anonym: Effectiveness and robustness across datasets
CN114913942A (en) Intelligent matching method and device for patient recruitment projects
CN114358001A (en) Method for standardizing diagnosis result, and related device, equipment and storage medium thereof
CN111785383A (en) Data processing method and related equipment
CN111597789A (en) Electronic medical record text evaluation method and equipment
US9881004B2 (en) Gender and name translation from a first to a second language
CN114238639A (en) Construction method and device of medical term standardized framework and electronic equipment
CN111400529B (en) Data processing method and device
CN113343680A (en) Structured information extraction method based on multi-type case history texts
CN110287270B (en) Entity relationship mining method and equipment
CN112735545A (en) Self-training method, model, processing method, device and storage medium
CN111984694A (en) Orthopedics search engine system
Amador-Domínguez et al. A case-based reasoning model powered by deep learning for radiology report recommendation
JP2017134693A (en) Meaning information registration support program, information processor and meaning information registration support method
CN116469526A (en) Training method, device, equipment and storage medium for traditional Chinese medicine diagnosis model

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination