CN111523309A - Medicine information normalization method and device, storage medium and electronic equipment - Google Patents

Medicine information normalization method and device, storage medium and electronic equipment Download PDF

Info

Publication number
CN111523309A
CN111523309A CN202010306577.0A CN202010306577A CN111523309A CN 111523309 A CN111523309 A CN 111523309A CN 202010306577 A CN202010306577 A CN 202010306577A CN 111523309 A CN111523309 A CN 111523309A
Authority
CN
China
Prior art keywords
field
standard
independent
combined
dictionary
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010306577.0A
Other languages
Chinese (zh)
Inventor
张黎
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Yiyiyun Technology Co ltd
Original Assignee
Beijing Yiyiyun Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Yiyiyun Technology Co ltd filed Critical Beijing Yiyiyun Technology Co ltd
Priority to CN202010306577.0A priority Critical patent/CN111523309A/en
Publication of CN111523309A publication Critical patent/CN111523309A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/237Lexical tools
    • G06F40/247Thesauruses; Synonyms
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/70ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for mining of medical data, e.g. analysing previous cases of other patients
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H70/00ICT specially adapted for the handling or processing of medical references
    • G16H70/40ICT specially adapted for the handling or processing of medical references relating to drugs, e.g. their side effects or intended usage

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Public Health (AREA)
  • General Health & Medical Sciences (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Primary Health Care (AREA)
  • Epidemiology (AREA)
  • Toxicology (AREA)
  • Artificial Intelligence (AREA)
  • Medicinal Chemistry (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Biomedical Technology (AREA)
  • Databases & Information Systems (AREA)
  • Pathology (AREA)
  • Pharmacology & Pharmacy (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Chemical & Material Sciences (AREA)
  • Medical Treatment And Welfare Office Work (AREA)

Abstract

The embodiment of the disclosure provides a medicine information normalization method and device, a computer readable medium and electronic equipment. The method comprises the following steps: extracting a plurality of field information of the medicine information from the medical record data, and determining a combined field formed by the plurality of field information; according to a first dictionary and the combined field, identifying a standard field corresponding to the combined field, and when determining that the standard field corresponding to the combined field does not exist in the first dictionary, determining a plurality of independent fields corresponding to the combined field; identifying a standard field corresponding to each independent field according to a second dictionary and the plurality of independent fields; and normalizing the combined field according to the standard field corresponding to each independent field. Before normalization processing is carried out, standard field identification of a combined field and an independent field is carried out on medical record data, and the efficiency and accuracy of normalization processing are improved.

Description

Medicine information normalization method and device, storage medium and electronic equipment
Technical Field
The present disclosure relates to the field of computer technologies and information processing technologies, and in particular, to a method and an apparatus for drug information normalization, a storage medium, and an electronic device.
Background
Today of electronic informatization, various medical institutions generate a plurality of medical data every day, along with the continuous improvement and perfection of the informatization degree of the medical industry, the data volume of an industry electronic medical record system is continuously increased, the problems of low speed, more rules, difficult maintenance and the like begin to occur in the traditional data processing based on a database, and the medical industry needs to introduce manual and intelligent means such as machine learning and deep learning to process data which cannot be covered by workers so as to improve the efficiency. However, the results given by these artificial intelligence means often present explanatory problems. For example, taking the data of the medical orders as an example, the algorithm modules of machine learning and deep learning can integrate multi-field information and give confidence (probability) of a certain standardized result.
In order to solve the explanatory problem of the algorithm and improve the accuracy of the algorithm, the following methods are generally adopted in the related art: 1. the method comprises the steps of counting and classifying original data to be standardized of a hospital database, and manually marking high-frequency and common data. 2. Technical personnel design regular expression rules to realize data cleaning and generalization matching, and the rules are embodied as Structured Query Language (SQL) or processing functions. 3. In the data display link, the standardized medical records are used for providing data support.
In the above technology, high-frequency and common data are marked manually, and various regular expressions need to be designed, so that the labor cost is high, the efficiency is low, errors are easy to occur, the technical difficulty is increased, in addition, the whole standardization application range is narrow, the interpretability is poor, and the normalization result is inaccurate.
It is to be noted that the information disclosed in the above background section is only for enhancement of understanding of the background of the present disclosure, and thus may include information that does not constitute prior art known to those of ordinary skill in the art.
Disclosure of Invention
The embodiment of the disclosure provides a method and a device for drug information normalization, a computer readable medium and an electronic device, so that the efficiency and accuracy of drug information normalization are improved at least to a certain extent.
Additional features and advantages of the disclosure will be set forth in the detailed description which follows, or in part will be obvious from the description, or may be learned by practice of the disclosure.
According to an aspect of an embodiment of the present disclosure, there is provided a method for normalizing drug information, including: extracting a plurality of field information of the medicine information from the medical record data, and determining a combined field formed by the plurality of field information; according to a first dictionary and the combined field, identifying a standard field corresponding to the combined field, and when determining that the standard field corresponding to the combined field does not exist in the first dictionary, determining a plurality of independent fields corresponding to the combined field; identifying a standard field corresponding to each independent field according to a second dictionary and the plurality of independent fields; and normalizing the combined field according to the standard field corresponding to each independent field.
In some exemplary embodiments of the present disclosure, based on the foregoing, the method further includes: when determining that the standard field corresponding to each independent field does not exist in the second dictionary, acquiring the independent field of the second dictionary without the corresponding standard field; extracting field participles from the independent fields; according to a third dictionary and the field participles, identifying standard fields corresponding to the field participles; and determining the standard field corresponding to each independent field according to the standard field corresponding to the field participle.
In some exemplary embodiments of the present disclosure, based on the foregoing scheme, determining the standard field corresponding to each independent field according to the standard field corresponding to the field participle includes: if the independent field of the corresponding standard field exists in the second dictionary, determining the standard field corresponding to each independent field according to the standard field corresponding to the independent field in the second dictionary and the standard field corresponding to the field segmentation of the independent field of which the standard field does not exist in the second dictionary; and if the independent field of the corresponding standard value does not exist in the second dictionary, determining the standard field corresponding to each independent field according to the standard field corresponding to the field segmentation of each independent field.
In some exemplary embodiments of the present disclosure, based on the foregoing, the method further includes: when it is determined that a standard field of a field participle of an independent field of the second dictionary does not have a corresponding standard field does not exist in the third dictionary, determining the standard field of the independent field based on an initial field of the independent field.
In some exemplary embodiments of the present disclosure, based on the foregoing scheme, extracting multiple pieces of field information of the drug information from the medical record data, and determining a combined field composed of the multiple pieces of field information includes: extracting a plurality of field information of the medicine information from the medical record data; and splicing the fields by using a connector, and determining a combined field formed by the information of the fields.
In some exemplary embodiments of the present disclosure, based on the foregoing scheme, the normalizing the combined field according to the standard field corresponding to each independent field includes: acquiring a learning model subjected to normalization processing; and normalizing the combined field based on the learning model and the standard field corresponding to each independent field.
In some exemplary embodiments of the present disclosure, based on the foregoing scheme, normalizing the combined field based on the learning model and the standard field corresponding to each independent field includes: splicing the standard fields corresponding to each independent field based on the input format of the learning module to generate a standard combination field corresponding to the combination field; and normalizing the combined field based on the learning module and the standard combined field.
According to an aspect of an embodiment of the present disclosure, there is provided an apparatus for normalizing drug information, including: the extraction module is configured to extract a plurality of pieces of field information of the medicine information from the medical record data and determine a combined field formed by the plurality of pieces of field information; a determining module configured to identify a standard field corresponding to the combined field according to a first dictionary and the combined field, and determine a plurality of independent fields corresponding to the combined field when it is determined that the standard field corresponding to the combined field does not exist in the first dictionary; a recognition module configured to recognize a standard field corresponding to each independent field according to a second dictionary and the plurality of independent fields; and the normalization module is configured to normalize the combined field according to the standard field corresponding to each independent field.
In some exemplary embodiments of the present disclosure, based on the foregoing, the identification module includes: an obtaining unit, configured to, when it is determined that the standard field corresponding to each independent field does not exist in the second dictionary, obtain an independent field in which the standard field corresponding to each independent field does not exist in the second dictionary; the extraction unit is configured to extract field participles from the independent fields; the recognition unit is configured to recognize a standard field corresponding to the field participle according to a third dictionary and the field participle; and the determining unit is configured to determine the standard field corresponding to each independent field according to the standard field corresponding to the field participle.
In some exemplary embodiments of the disclosure, based on the foregoing solution, the determining unit is configured to, if there is an independent field of the corresponding standard field in the second dictionary, determine the standard field corresponding to each independent field according to the standard field corresponding to the independent field in the second dictionary of the independent field and the standard field corresponding to the field segmentation of the independent field in the second dictionary without the corresponding standard field; and if the independent field of the corresponding standard value does not exist in the second dictionary, determining the standard field corresponding to each independent field according to the standard field corresponding to the field segmentation of each independent field.
In some exemplary embodiments of the disclosure, based on the foregoing scheme, the determining unit is further configured to, when it is determined that the standard field of the field participle of the independent field of the second dictionary for which the corresponding standard field does not exist in the third dictionary, determine the standard field of the independent field based on the initial field of the independent field.
In some exemplary embodiments of the present disclosure, based on the foregoing scheme, the extracting module is configured to extract a plurality of field information of the drug information from the medical record data; and splicing the fields by using a connector to determine a combined field formed by the field information.
In some exemplary embodiments of the present disclosure, based on the foregoing scheme, the normalization module includes: an acquisition unit configured to acquire a learning model of normalization processing; and the normalization unit is configured to normalize the combined field based on the learning model and the standard field corresponding to each independent field.
In some exemplary embodiments of the present disclosure, based on the foregoing scheme, the normalization unit is configured to splice the standard fields corresponding to each independent field based on the input format of the learning module, and generate a standard combined field corresponding to the combined field; and normalizing the combined field based on the learning module and the standard combined field.
According to an aspect of the embodiments of the present disclosure, there is provided a computer-readable storage medium storing a computer program, wherein the computer program is executed by a processor to implement the method for drug information normalization as described in the above embodiments.
According to an aspect of an embodiment of the present disclosure, there is provided an electronic device including: one or more processors; a storage device for storing one or more programs which, when executed by the one or more processors, cause the one or more processors to implement the method of drug information normalization as described in the above embodiments.
In the embodiment of the disclosure, a plurality of pieces of field information of medicine information are extracted from medical record data, and a combined field formed by the plurality of pieces of field information is determined; according to a first dictionary and the combined field, identifying a standard field corresponding to the combined field, and when determining that the standard field corresponding to the combined field does not exist in the first dictionary, determining a plurality of independent fields corresponding to the combined field; identifying a standard field corresponding to each independent field according to a second dictionary and the plurality of independent fields; and normalizing the combined field according to the standard field corresponding to each independent field. Before normalization processing is carried out, standard field identification of a combined field and an independent field is carried out on medical record data, and the efficiency and accuracy of normalization processing are improved.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present disclosure and together with the description, serve to explain the principles of the disclosure. It is to be understood that the drawings in the following description are merely exemplary of the disclosure, and that other drawings may be derived from those drawings by one of ordinary skill in the art without the exercise of inventive faculty.
In the drawings:
FIG. 1 illustrates a schematic diagram of an exemplary system architecture 100 to which the method or apparatus for drug information normalization of embodiments of the present disclosure may be applied;
FIG. 2 schematically illustrates a flow chart of a method of drug information normalization according to one embodiment of the present disclosure;
FIG. 3 schematically illustrates a flow chart of a method of drug information normalization according to another embodiment of the present disclosure;
FIG. 4 schematically illustrates a diagram of medical record data flow, according to an embodiment of the disclosure;
FIG. 5 schematically illustrates a block diagram of an apparatus for drug information normalization according to an embodiment of the present disclosure;
FIG. 6 illustrates a schematic structural diagram of a computer system suitable for use in implementing an electronic device of an embodiment of the present disclosure.
Detailed Description
Example embodiments will now be described more fully with reference to the accompanying drawings. Example embodiments may, however, be embodied in many different forms and should not be construed as limited to the examples set forth herein; rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the concept of example embodiments to those skilled in the art.
Furthermore, the described features, structures, or characteristics may be combined in any suitable manner in one or more embodiments. In the following description, numerous specific details are provided to give a thorough understanding of embodiments of the disclosure. One skilled in the relevant art will recognize, however, that the subject matter of the present disclosure can be practiced without one or more of the specific details, or with other methods, components, devices, steps, and so forth. In other instances, well-known methods, devices, implementations, or operations have not been shown or described in detail to avoid obscuring aspects of the disclosure.
The block diagrams shown in the figures are functional entities only and do not necessarily correspond to physically separate entities. I.e. these functional entities may be implemented in the form of software, or in one or more hardware modules or integrated circuits, or in different networks and/or processor means and/or microcontroller means.
The flow charts shown in the drawings are merely illustrative and do not necessarily include all of the contents and operations/steps, nor do they necessarily have to be performed in the order described. For example, some operations/steps may be decomposed, and some operations/steps may be combined or partially combined, so that the actual execution sequence may be changed according to the actual situation.
Fig. 1 shows a schematic diagram of an exemplary system architecture 100 to which the method or apparatus for drug information normalization of embodiments of the present disclosure may be applied.
As shown in fig. 1, the system architecture 100 may include one or more of terminal devices 101, 102, 103, a network 104, and a server 105. The network 104 serves as a medium for providing communication links between the terminal devices 101, 102, 103 and the server 105. Network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, to name a few.
It should be understood that the number of terminal devices, networks, and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation. For example, server 105 may be a server cluster comprised of multiple servers, or the like.
The user may use the terminal devices 101, 102, 103 to interact with the server 105 via the network 104 to receive or send messages or the like. The terminal devices 101, 102, 103 may be various electronic devices having a display screen, including but not limited to smart phones, tablet computers, portable computers, desktop computers, and the like.
The server 105 may be a server that provides various services. For example, the terminal device 103 (or the terminal device 101 or 102) sends a request for information processing to the server 105, and the server 105 may extract a plurality of pieces of field information of the medicine information from the medical record data based on the request, and determine a combined field composed of the plurality of pieces of field information; according to a first dictionary and the combined field, identifying a standard field corresponding to the combined field, and when determining that the standard field corresponding to the combined field does not exist in the first dictionary, determining a plurality of independent fields corresponding to the combined field; identifying a standard field corresponding to each independent field according to a second dictionary and the plurality of independent fields; and normalizing the combined field according to the standard field corresponding to each independent field. And transmits the normalization processing result to the terminal 103, and the terminal 103 can display the normalization processing result.
Fig. 2 schematically illustrates a flow chart of a method of drug information normalization according to one embodiment of the present disclosure. The method provided by the embodiment of the present disclosure may be processed by any electronic device with computing processing capability, for example, the server 105 and/or the terminal devices 102 and 103 in the embodiment of fig. 1 described above, and in the following embodiment, the server 105 is taken as an execution subject for example, but the present disclosure is not limited thereto.
As shown in fig. 2, a method for normalizing drug information provided by an embodiment of the present disclosure may include the following steps:
in step S210, a plurality of pieces of field information of the medicine information are extracted from the medical record data, and a combined field composed of the plurality of pieces of field information is determined.
In the embodiment of the present disclosure, the medical record data may be medical record data of a bad sample fed back and labeled by a hospital and each service line user, and the bad sample refers to an erroneous sample normalized by the algorithm module.
In the embodiment of the disclosure, based on the normalization processing of the bad samples, the data size of the bad samples is far smaller than the high-frequency and common data which need to be manually labeled in the related technology, and moreover, the normalization processing is adjusted based on the bad samples, so that the generalization and interpretability of the whole normalization processing are increased, and the accuracy of the normalization processing is improved.
In the embodiment of the disclosure, after the fields of the medical record data are acquired, the field information of the medicine order part can be extracted from the fields, so as to extract the fields of the medicine information. Or, if only the identifier of the medical record is obtained, the hospital information system needs to be searched to obtain each field of the medical record, and then a plurality of fields of the medicine order part are extracted from the fields so as to extract a plurality of fields of the medicine information. It should be noted that the medical record data may include not only the medical order, but also patient information.
In the embodiment of the present disclosure, the fields of the drug information may include, but are not limited to: name of medicine, common name, dosage form, manufacturer.
In the embodiment of the present disclosure, after extracting the plurality of pieces of field information of the medicine information, the plurality of pieces of field information may be spliced to determine the combined field formed by the plurality of pieces of field information.
In the embodiment of the present disclosure, when splicing a plurality of pieces of field information, the plurality of pieces of field information may be spliced by using a preset connector, and the connector may be the same as a connector for performing standard field splicing subsequently.
Note that each field of the medicine information extracted from the medical record data has its initial value, for example, "product name": "aminofexan", wherein "trade name" is a field name and "aminofexan" is an initial value of the field.
For example, a number of fields in the drug order: the trade name, common name, dosage form and initial value of manufacturer are spliced with' @ @ to obtain a spliced combined field: NULL @ methyl Aspirin enteric-coated tablet (Baixa) [0.1g @ 30 tablets ] @ tablets @ NULL.
In step S220, a standard field corresponding to the combined field is identified according to a first dictionary and the combined field, and when it is determined that the standard field corresponding to the combined field does not exist in the first dictionary, a plurality of independent fields corresponding to the combined field are determined.
The standard field is a standardized and normalized name of an initial value of a certain field, and for example, the standard field of "amazamine-free" is "amazamine-paracetamol".
In the embodiment of the present disclosure, a first dictionary is preset, the first dictionary may be represented by keydit, and a correspondence relationship between a plurality of combined fields and standard fields corresponding to the combined fields is recorded in the first dictionary. After a combined field composed of a plurality of fields is obtained, a standard field corresponding to the combined field is identified from the first dictionary.
For example, table 1 is a part of the first dictionary provided in the embodiment of the present invention:
Figure BDA0002455991310000081
TABLE 1
Based on table 1 above, the combined field can be obtained: NULL @ methyl Aspirin enteric coated tablet (baixa) [0.1g @ 30 tablets ] @ tablets @ NULL standard field.
It should be noted that there may be a case where a plurality of combined fields correspond to one standard field in the first dictionary. The standard fields in the first dictionary may be spliced according to the splicing sequence and the connectors in S210 when the pieces of field information are spliced, so that the same combined field corresponds to at most one standard field. The splicing sequence may be different from the splicing sequence related to normalization processing in the following, and the plurality of pieces of field information may not all be the splicing fields subjected to normalization in the following, so after the standard fields of the plurality of pieces of field information are obtained, splicing needs to be performed according to the splicing fields subjected to normalization in the following and the splicing sequence.
In an embodiment of the present disclosure, when it is determined that the standard field corresponding to the combined field does not exist in the first dictionary, a plurality of independent fields corresponding to the combined field are determined.
According to the embodiment of the invention, if the information of the fields is spliced, the combined field can be split into the independent fields according to the connector in splicing, and each independent field in the fields can also be directly acquired. Wherein, independent fields, for example, "trade name": "basalagiline", also for example, "generic name": an aspirin enteric-coated tablet.
It should be noted that each field of the extracted drug information is an independent field.
In step S230, a standard field corresponding to each independent field is identified according to the second dictionary and the plurality of independent fields.
In the embodiment of the present disclosure, a second dictionary is preset, the second dictionary may be identified by field cut, and a corresponding relationship between an independent field and a standard field corresponding to the independent field is recorded in the second dictionary. After the independent fields are obtained, the standard field corresponding to each independent field is searched in the second dictionary. The independent field may include a plurality of participles or may include a field that is not suitable for participle extraction.
For example, table 2 provides a part of a second dictionary according to an embodiment of the present invention:
independent field Standard field
{ methyl } Aspirin enteric-coated tablet (Baia) Aspirin enteric-coated tablet
Salbutamol atomized solution Salbutamol sulfate solution for inhalation
TABLE 2
As shown in table 2, it should be noted that the field names may be included in the independent fields as well as in the standard fields. According to table 2, if the independent field is: "salbutamol nebulized solution", table 2 is looked up and the standard field for this separate field can be obtained: "Salbutamol sulfate solution for inhalation".
In S240, the combined field is normalized according to the standard field corresponding to each independent field.
In the embodiment of the present disclosure, a learning model of normalization processing may be obtained. The learning model can be a model constructed based on machine learning or deep learning, and is used for further normalizing the field to obtain the probability of the normalization result corresponding to the field. The plurality of fields depending on the learning model are spliced by specific connectors to be used as the input which can be identified by the learning model, so that the learning model can further calculate the normalization result of the combined field.
In the embodiment of the present disclosure, after the standard field of each independent field is determined, the combined field may be normalized based on the learning model and the standard field corresponding to each independent field.
In the embodiment of the disclosure, based on the input format of the learning model, the spliced field and the splicing order are determined from the standard fields of the independent fields, and the standard fields are spliced according to the spliced field and the splicing order to obtain the spliced standard combination field.
It should be noted that, during the splicing, the standard fields of the individual fields may be spliced by preset connectors. The splicing field determined based on the learning model may be all or part of the field obtained to the standard, and the splicing order determined based on the learning model may be different from the splicing order of the combined field of the plurality of fields.
For example, the standard fields of the plurality of independent fields are respectively: { "bai aspalaling", "aspirin enteric-coated tablet", "bayer", "tablet" }, according to the input format of the learning model, the splicing field and the splicing sequence are: trade name, common name, formulation and manufacturer's standard field, wherein, with preset connector @ between each standard field as the concatenation, splice the back, the standard combination field that obtains is "kayatiaoling @ aspirin enteric coated tablet @ @ bayer @".
It should be noted that the standard combination field is composed of standard fields of independent fields.
In the embodiment of the disclosure, a plurality of pieces of field information of medicine information are extracted from medical record data, and a combined field formed by the plurality of pieces of field information is determined; according to a first dictionary and the combined field, identifying a standard field corresponding to the combined field, and when determining that the standard field corresponding to the combined field does not exist in the first dictionary, determining a plurality of independent fields corresponding to the combined field; identifying a standard field corresponding to each independent field according to a second dictionary and the plurality of independent fields; and normalizing the combined field according to the standard field corresponding to each independent field. Before normalization processing is carried out, standard field identification of a combined field and an independent field is carried out on medical record data, and the efficiency and accuracy of normalization processing are improved.
Fig. 3 schematically shows a flowchart of a method for normalizing drug information according to another embodiment of the present disclosure, and the method provided by the embodiment of the present disclosure may be processed by any electronic device with computing processing capability, such as the server 105 and/or the terminal devices 102 and 103 in the embodiment of fig. 1, in the following embodiments, the server 105 is taken as an execution subject for illustration, but the present disclosure is not limited thereto.
As shown in fig. 3, a method for normalizing drug information provided by an embodiment of the present disclosure may include the following steps:
in step S301, when it is determined that the corresponding standard field of each independent field does not exist in the second dictionary, the independent field in which the corresponding standard field does not exist in the second dictionary is obtained.
In the embodiment of the present disclosure, if only a part of the standard fields or none of the standard fields corresponding to each of the independent fields may exist in the second dictionary, at this time, the independent fields in which the corresponding standard fields do not exist in the second dictionary are obtained.
In step S302, field participles are extracted from the independent fields. In step S303, a standard field corresponding to the field participle is identified according to a third dictionary and the field participle.
In the embodiment of the present disclosure, a third dictionary is preset, where the third dictionary may be represented by termdect, and a corresponding relationship between a field participle and a standard field corresponding to the field participle is recorded in the third dictionary. And after the field participles of each independent field without the standard field in the second dictionary are obtained, the standard field corresponding to the field participles is searched in the third dictionary. The standard fields in the third dictionary can be formed on the basis of analysis of the independent fields, effective characteristic words or synonyms are extracted from the independent fields, and word segmentation problems and semantic problems caused by incapability of recognizing certain medical data in an algorithm strategy are avoided.
For example, table 3 is a part of the third dictionary provided in the embodiment of the present invention:
word segmentation Standard field
Amazofamid Paracetamol, chlorphenamine maleate and chlorphenamine maleate
TABLE 3
As shown in table 3, it should be noted that there may be cases where a plurality of field participles correspond to one standard field. According to table 3, if the field of an independent field is "am not kamin", then the standard field of the independent field can be obtained as "am fen kamin".
In an embodiment of the present disclosure, when it is determined that a standard field of a field participle of an independent field of the second dictionary does not have a corresponding standard field does not exist in the third dictionary, the standard field of the independent field is determined based on an initial field of the independent field.
For example, assuming that the independent field is "ammonia insensitive", the standard field for the independent field is not present in the second dictionary, and the standard field for the field is not present in the third dictionary, then "ammonia insensitive" is determined as the standard field for the independent field.
In step S304, a standard field corresponding to each independent field is determined according to the standard field corresponding to the field participle.
In the embodiment of the disclosure, before normalization processing is performed, multi-level (recognition from a combined field to an independent field or even to a participle field) and check from coarse granularity to fine granularity are performed on medical record data based on the first dictionary, the second dictionary and the third dictionary to determine the standard field, so that the normalization accuracy and the standard value searching efficiency are improved, and thus the normalization processing efficiency and accuracy are improved.
Fig. 4 schematically illustrates a diagram of medical record data flow according to an embodiment of the present disclosure, and a method provided by the embodiment of the present disclosure may be processed by any electronic device with computing processing capability, such as the server 105 and/or the terminal devices 102 and 103 in the above embodiment of fig. 1, in the following embodiment, the server 105 is taken as an execution subject for illustration, but the present disclosure is not limited thereto.
As shown in fig. 4, 3 fields of the medicine information are extracted, which are A, B, C, and the standard field identification of the combined field is firstly performed, and the standard field identification is spliced into: and searching the first dictionary for the combined field of A + B + C, and searching the standard field corresponding to the combined field. If the standard is not found, performing standard field identification on the independent field, respectively searching the standard fields corresponding to the fields A, B, C, assuming that the standard fields of the field a are a ', B, C do not find the standard fields, and the standard fields of all the independent fields are not found, performing standard field identification on the independent fields which do not find the standard fields of the field participles, for example, extracting the field participle C1 from C, finding the standard field C' corresponding to C1 in a third dictionary, and assuming that the standard field extracted from B does not exist in the third dictionary, taking B as the standard value of the field. Further, the standard fields determined by the independent fields and the standard fields determined by the field participles of the independent fields are combined to obtain the standard fields of all the fields, for example, the standard values corresponding to the A, B, C fields are a 'and B, C', respectively. Further, the standard fields of the independent fields are spliced to obtain a standard combination field: a '@ C' @ B, and inputs the standard combination field to a learning model of normalization processing, and performs normalization processing to obtain a plurality of normalization fields such as a normalization result K1 ═ V1, a normalization result K2 ═ V2, and a normalization result K3 ═ V3, where K denotes a field name and V denotes a corresponding normalization result.
In the embodiment of the disclosure, the standard fields are identified through the multi-level sequence of the combined fields, the independent fields and the field participles, and compared with a traditional single-layer intervention mechanism, the method improves the standardization efficiency and accuracy, thereby improving the efficiency and accuracy of normalization processing, and simultaneously improving the generalization capability of the learning model of the normalization processing on a small number of bad samples.
Embodiments of the apparatus of the present disclosure are described below, which may be used to perform the above-mentioned method for normalizing the medicine information of the present disclosure. For details not disclosed in the embodiments of the apparatus of the present disclosure, please refer to the embodiments of the method for normalizing drug information described above in the present disclosure.
Fig. 5 schematically illustrates a block diagram of an apparatus for drug information normalization according to an embodiment of the present disclosure.
Referring to fig. 5, an apparatus 500 for normalizing drug information according to an embodiment of the present disclosure may include: an extraction module 510, a determination module 520, an identification module 530, and a normalization module 540.
The extraction module 510 can be configured to extract a plurality of field information of the drug information from the medical record data and determine a combined field composed of the plurality of field information.
The determining module 520 may be configured to identify a standard field corresponding to the combined field according to a first dictionary and the combined field, and determine a plurality of independent fields corresponding to the combined field when determining that the standard field corresponding to the combined field does not exist in the first dictionary.
The retrieving module 530 may be configured to identify a standard field corresponding to each of the plurality of independent fields based on the second dictionary and the plurality of independent fields.
The input module 540 may be configured to normalize the combined field according to the standard field corresponding to each individual field.
In the embodiment of the disclosure, a plurality of pieces of field information of medicine information are extracted from medical record data, and a combined field formed by the plurality of pieces of field information is determined; according to a first dictionary and the combined field, identifying a standard field corresponding to the combined field, and when determining that the standard field corresponding to the combined field does not exist in the first dictionary, determining a plurality of independent fields corresponding to the combined field; identifying a standard field corresponding to each independent field according to a second dictionary and the plurality of independent fields; and normalizing the combined field according to the standard field corresponding to each independent field. Before normalization processing is carried out, multi-level standard field identification is carried out on medical record data, and the efficiency and accuracy of normalization processing are improved.
FIG. 6 illustrates a schematic structural diagram of a computer system suitable for use in implementing an electronic device of an embodiment of the present disclosure. It should be noted that the computer system 600 of the electronic device shown in fig. 6 is only an example, and should not bring any limitation to the functions and the scope of the application of the embodiments of the present disclosure.
As shown in fig. 6, the computer system 600 includes a Central Processing Unit (CPU)601 that can perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM)602 or a program loaded from a storage section 608 into a Random Access Memory (RAM) 603. In the RAM 603, various programs and data necessary for system operation are also stored. The CPU601, ROM 602, and RAM 603 are connected to each other via a bus 604. An input/output (I/O) interface 605 is also connected to bus 604.
The following components are connected to the I/O interface 605: an input portion 606 including a keyboard, a mouse, and the like; an output portion 607 including a display such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, and a speaker; a storage section 608 including a hard disk and the like; and a communication section 609 including a network interface card such as a LAN card, a modem, or the like. The communication section 609 performs communication processing via a network such as the internet. The driver 610 is also connected to the I/O interface 605 as needed. A removable medium 611 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 610 as necessary, so that a computer program read out therefrom is mounted in the storage section 608 as necessary.
In particular, the processes described below with reference to the flowcharts may be implemented as computer software programs, according to embodiments of the present disclosure. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method illustrated in the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network through the communication section 609, and/or installed from the removable medium 611. When the computer program is executed by a Central Processing Unit (CPU)601, various functions defined in the system of the present application are executed.
It should be noted that the computer readable media shown in the present disclosure may be computer readable signal media or computer readable storage media or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present disclosure, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In contrast, in the present disclosure, a computer-readable signal medium may include a propagated data signal with computer-readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The modules and/or units described in the embodiments of the present disclosure may be implemented by software, or may be implemented by hardware, and the described modules and/or units may also be disposed in a processor. Wherein the names of such modules and/or units do not in some way constitute a limitation on the modules and/or units themselves.
As another aspect, the present application also provides a computer-readable medium, which may be contained in the electronic device described in the above embodiments; or may exist separately without being assembled into the electronic device. The computer readable medium carries one or more programs which, when executed by an electronic device, cause the electronic device to implement the method as described in the embodiments below. For example, the electronic device may implement the steps shown in fig. 2, or fig. 3, or fig. 4.
It should be noted that although in the above detailed description several modules or units of the device for action execution are mentioned, such a division is not mandatory. Indeed, the features and functionality of two or more modules or units described above may be embodied in one module or unit, according to embodiments of the present disclosure. Conversely, the features and functions of one module or unit described above may be further divided into embodiments by a plurality of modules or units.
Through the above description of the embodiments, those skilled in the art will readily understand that the exemplary embodiments described herein may be implemented by software, or by software in combination with necessary hardware. Therefore, the technical solution according to the embodiments of the present disclosure may be embodied in the form of a software product, which may be stored in a non-volatile storage medium (which may be a CD-ROM, a usb disk, a removable hard disk, etc.) or on a network, and includes several instructions to enable a computing device (which may be a personal computer, a server, a touch terminal, or a network device, etc.) to execute the method according to the embodiments of the present disclosure.
Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This application is intended to cover any variations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.
It will be understood that the present disclosure is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the present disclosure is limited only by the appended claims.

Claims (10)

1. A method for normalizing drug information, comprising:
extracting a plurality of field information of the medicine information from the medical record data, and determining a combined field formed by the plurality of field information;
according to a first dictionary and the combined field, identifying a standard field corresponding to the combined field, and when determining that the standard field corresponding to the combined field does not exist in the first dictionary, determining a plurality of independent fields corresponding to the combined field;
identifying a standard field corresponding to each independent field according to a second dictionary and the plurality of independent fields;
and normalizing the combined field according to the standard field corresponding to each independent field.
2. The method of claim 1, wherein the method further comprises:
when determining that the standard field corresponding to each independent field does not exist in the second dictionary, acquiring the independent field of the second dictionary without the corresponding standard field;
extracting field participles from the independent fields;
according to a third dictionary and the field participles, identifying standard fields corresponding to the field participles;
and determining the standard field corresponding to each independent field according to the standard field corresponding to the field participle.
3. The method of claim 2, wherein determining the standard field corresponding to each independent field according to the standard field corresponding to the field participle comprises:
if the independent field of the corresponding standard field exists in the second dictionary, determining the standard field corresponding to each independent field according to the standard field corresponding to the independent field in the second dictionary and the standard field corresponding to the field segmentation of the independent field of which the standard field does not exist in the second dictionary;
and if the independent field of the corresponding standard value does not exist in the second dictionary, determining the standard field corresponding to each independent field according to the standard field corresponding to the field segmentation of each independent field.
4. The method of claim 2, wherein the method further comprises:
when it is determined that a standard field of a field participle of an independent field of the second dictionary does not have a corresponding standard field does not exist in the third dictionary, determining the standard field of the independent field based on an initial field of the independent field.
5. The method of claim 1, wherein extracting a plurality of field information of the drug information from the medical record data, and determining a combined field of the plurality of field information comprises:
extracting a plurality of field information of the medicine information from the medical record data;
and splicing the fields by using a connector, and determining a combined field formed by the information of the fields.
6. The method of claim 1, wherein normalizing the combined field according to the standard field corresponding to each independent field comprises:
acquiring a learning model subjected to normalization processing;
and normalizing the combined field based on the learning model and the standard field corresponding to each independent field.
7. The method of claim 6, wherein normalizing the combined field based on the learning model and the standard field corresponding to each individual field comprises:
splicing the standard fields corresponding to each independent field based on the input format of the learning module to generate a standard combination field corresponding to the combination field;
and normalizing the combined field based on the learning module and the standard combined field.
8. An apparatus for normalizing drug information, comprising:
the extraction module is configured to extract a plurality of pieces of field information of the medicine information from the medical record data and determine a combined field formed by the plurality of pieces of field information;
a determining module configured to identify a standard field corresponding to the combined field according to a first dictionary and the combined field, and determine a plurality of independent fields corresponding to the combined field when it is determined that the standard field corresponding to the combined field does not exist in the first dictionary;
a recognition module configured to recognize a standard field corresponding to each independent field according to a second dictionary and the plurality of independent fields;
and the normalization module is configured to normalize the combined field according to the standard field corresponding to each independent field.
9. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the method according to any one of claims 1 to 7.
10. An electronic device, comprising:
one or more processors;
storage means for storing one or more programs;
when executed by the one or more processors, cause the one or more processors to implement the method of any one of claims 1-7.
CN202010306577.0A 2020-04-17 2020-04-17 Medicine information normalization method and device, storage medium and electronic equipment Pending CN111523309A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010306577.0A CN111523309A (en) 2020-04-17 2020-04-17 Medicine information normalization method and device, storage medium and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010306577.0A CN111523309A (en) 2020-04-17 2020-04-17 Medicine information normalization method and device, storage medium and electronic equipment

Publications (1)

Publication Number Publication Date
CN111523309A true CN111523309A (en) 2020-08-11

Family

ID=71904225

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010306577.0A Pending CN111523309A (en) 2020-04-17 2020-04-17 Medicine information normalization method and device, storage medium and electronic equipment

Country Status (1)

Country Link
CN (1) CN111523309A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113641881A (en) * 2021-08-23 2021-11-12 北京字跳网络技术有限公司 Metadata display method, device, equipment and medium
CN113948170A (en) * 2021-08-18 2022-01-18 天津开心生活科技有限公司 Treatment duration obtaining method and device, electronic equipment and storage medium

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113948170A (en) * 2021-08-18 2022-01-18 天津开心生活科技有限公司 Treatment duration obtaining method and device, electronic equipment and storage medium
CN113641881A (en) * 2021-08-23 2021-11-12 北京字跳网络技术有限公司 Metadata display method, device, equipment and medium

Similar Documents

Publication Publication Date Title
US10496748B2 (en) Method and apparatus for outputting information
CN108153901B (en) Knowledge graph-based information pushing method and device
CN108920453B (en) Data processing method and device, electronic equipment and computer readable medium
CN107833603B (en) Electronic medical record document classification method and device, electronic equipment and storage medium
CN109522552B (en) Normalization method and device of medical information, medium and electronic equipment
US20200257659A1 (en) Method and apparatus for determing description information, electronic device and computer storage medium
WO2021135455A1 (en) Semantic recall method, apparatus, computer device, and storage medium
CN111159220B (en) Method and apparatus for outputting structured query statement
CN111143505B (en) Document processing method, device, medium and electronic equipment
CN111523309A (en) Medicine information normalization method and device, storage medium and electronic equipment
CN112131322A (en) Time series classification method and device
CN111143394B (en) Knowledge data processing method, device, medium and electronic equipment
CN111667923A (en) Data matching method and device, computer readable medium and electronic equipment
CN113488157B (en) Intelligent diagnosis guiding processing method and device, electronic equipment and storage medium
CN111063447B (en) Query and text processing method and device, electronic equipment and storage medium
CN115620886B (en) Data auditing method and device
CN116244386B (en) Identification method of entity association relation applied to multi-source heterogeneous data storage system
CN109086438B (en) Method and device for inquiring information
US20230085684A1 (en) Method of recommending data, electronic device, and medium
CN111640517A (en) Medical record encoding method and device, storage medium and electronic equipment
CN111126034A (en) Medical variable relation processing method and device, computer medium and electronic equipment
CN111125311A (en) Method and device for checking information normalization processing, storage medium and electronic equipment
CN116578700A (en) Log classification method, log classification device, equipment and medium
CN115762704A (en) Prescription auditing method, device, equipment and storage medium
CN113963804A (en) Medical data relation mining method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination