WO2022151896A1 - 确定药品编码的方法、装置、电子设备以及计算机介质 - Google Patents

确定药品编码的方法、装置、电子设备以及计算机介质 Download PDF

Info

Publication number
WO2022151896A1
WO2022151896A1 PCT/CN2021/138298 CN2021138298W WO2022151896A1 WO 2022151896 A1 WO2022151896 A1 WO 2022151896A1 CN 2021138298 W CN2021138298 W CN 2021138298W WO 2022151896 A1 WO2022151896 A1 WO 2022151896A1
Authority
WO
WIPO (PCT)
Prior art keywords
code
drug
component
components
screening candidate
Prior art date
Application number
PCT/CN2021/138298
Other languages
English (en)
French (fr)
Inventor
赵楠
吴友政
Original Assignee
北京沃东天骏信息技术有限公司
北京京东世纪贸易有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京沃东天骏信息技术有限公司, 北京京东世纪贸易有限公司 filed Critical 北京沃东天骏信息技术有限公司
Priority to JP2023553759A priority Critical patent/JP2023550212A/ja
Priority to US18/272,315 priority patent/US20240071630A1/en
Publication of WO2022151896A1 publication Critical patent/WO2022151896A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H70/00ICT specially adapted for the handling or processing of medical references
    • G16H70/40ICT specially adapted for the handling or processing of medical references relating to drugs, e.g. their side effects or intended usage
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/38Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/381Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using identifiers, e.g. barcodes, RFIDs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/31Indexing; Data structures therefor; Storage structures
    • G06F16/316Indexing structures
    • G06F16/319Inverted lists
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/335Filtering based on additional data, e.g. user or group profiles
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/38Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/383Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • G06F40/295Named entity recognition

Definitions

  • the present disclosure relates to the field of computer technology, in particular to the field of artificial intelligence technology, and in particular to a method, an apparatus, an electronic device, a computer-readable medium, and a computer program product for determining a drug code.
  • ATC Anatomical Therapeutic Chemical
  • ATC Anatomical Therapeutic Chemical
  • the molecular formula structure of the drug is generally predicted by the learning classification algorithm, and the ATC code of the drug is obtained.
  • predicting the ATC code of a drug by molecular formula structure is complicated and the accuracy is not high, and it is not suitable for drugs other than newly developed drugs.
  • Embodiments of the present disclosure propose a method, apparatus, electronic device, computer-readable medium, and computer program product for determining a drug code.
  • an embodiment of the present disclosure provides a method for determining a drug code, the method comprising: obtaining an instruction text of a drug; extracting key drug information in the instruction text; At least one code related to the key information and components corresponding to each code; based on the key information of the drug and the components corresponding to each code, the at least one code is screened to obtain the anatomical therapeutic and chemical classification system code of the drug.
  • the above-mentioned drug key information includes drug components; the above-mentioned at least one code is screened based on the drug key information and the components corresponding to each code to obtain the anatomical therapeutic and chemical classification system code of the drug, including: for For each code in the at least one code, detect whether the component corresponding to the code satisfies one of a plurality of rules with a priority order, and the plurality of rules are determined based on the drug component; in response to determining that the component corresponding to the code satisfies one of the plurality of rules One of the codes and all codes are detected, determine the primary screening candidate code including the code; in response to detecting that there is only one code for the primary screening candidate code, determine that the primary screening candidate code is the anatomical therapeutic and chemical classification system code of the drug.
  • the above-mentioned multiple rules are ordered from high to low priority as follows: 1) when there are two or more drug components, the components corresponding to the code include all the drug components of the drug; 2) the drug components have two or above, the component corresponding to the code includes at least one drug component of the drug and contains the word compound; 3) When there are two or more drug components, the component corresponding to the code includes at least one drug component of the drug and does not contain the word compound; 4) When there is one drug component, the component corresponding to the code includes the drug component.
  • the above-mentioned drug key information includes drug components, and based on the drug key information and the components corresponding to each code, at least one code is screened to obtain the anatomical therapeutic and chemical classification system code of the drug, including: targeting at least one code. For each code in one code, check whether the component corresponding to the code matches the drug component; in response to determining that the component corresponding to the code matches the drug component and all codes have been detected, obtain a preliminary screening candidate code including the code ; in response to detecting that there is only one code for the primary screening candidate code, the primary screening candidate code is determined to be the anatomical therapeutic and chemical classification system code of the drug product.
  • the above-mentioned drug key information further includes drug indications
  • the above-mentioned screening at least one code based on the drug key information and the components corresponding to each code to obtain the anatomical therapeutic and chemical classification system code of the drug, further including :
  • the disease type corresponding to the drug is determined based on the indication of the drug; the code corresponding to the disease type is selected from the primary screening candidate codes as the anatomical therapeutic and chemical classification of the drug System code.
  • the above-mentioned determining the disease type corresponding to the drug based on the drug indication includes: using a pre-trained classification model to classify the indication, and obtain the disease type output by the classification model.
  • embodiments of the present disclosure provide a device for determining a drug code, the device comprising: an acquisition unit configured to acquire an instruction text of a drug; an extraction unit configured to extract key drug information in the instruction text; The obtaining unit is configured to obtain at least one code related to the key information of the medicine and the component corresponding to each code based on the pre-created code inverted index; the screening unit is configured to be based on the key information of the medicine and the component corresponding to each code, At least one code is screened to obtain an anatomical therapeutic and chemical classification system code for the drug product.
  • the above-mentioned drug key information includes drug components
  • the above-mentioned screening unit includes: a detection module configured to, for each code in the at least one code, detect whether the component corresponding to the code satisfies a plurality of priority orders.
  • the primary screening module is configured to determine, in response to determining that the component corresponding to the code satisfies one of the plurality of rules and all codes are detected, determine the primary screening candidates including the code The encoding; the determining module is configured to, in response to detecting that there is only one encoding for the preliminary screening candidate encoding, determine the preliminary screening candidate encoding as the anatomical therapeutic and chemical classification system encoding of the drug product.
  • the above-mentioned multiple rules are ordered from high to low priority as follows: 1) when there are two or more drug components, the components corresponding to the code include all the drug components of the drug; 2) the drug components have two or above, the component corresponding to the code includes at least one drug component of the drug and contains the word compound; 3) When there are two or more drug components, the component corresponding to the code includes at least one drug component of the drug and does not contain the word compound; 4) When there is one drug component, the component corresponding to the code includes the drug component.
  • the above-mentioned drug key information includes drug components
  • the above-mentioned screening unit includes: a matching module configured to, for each code in the at least one code, detect whether a component corresponding to the code matches a drug component; a response module , is configured to obtain a preliminary screening candidate code including the code in response to determining that the component corresponding to the code matches the drug component and all codes are detected; the coding module is configured to respond to detecting that there is only one preliminary screening candidate code Code, determine the primary screening candidate code as the anatomical therapeutic and chemical classification system code of the drug.
  • the above-mentioned key drug information further includes drug indications
  • the screening unit further includes: a classification module, configured to, in response to detecting that the primary screening candidate codes are multiple codes, determine the disease corresponding to the drug based on the drug indications The type; the confirmation module is configured to select the code corresponding to the disease type from the preliminary screening candidate codes as the code of the anatomical therapeutic and chemical classification system of the medicine.
  • the above-mentioned classification module is further configured to use the pre-trained classification model to classify the indications, and obtain the disease types output by the classification model.
  • embodiments of the present disclosure provide an electronic device, the electronic device includes: one or more processors; a storage device on which one or more programs are stored; when the one or more programs are stored by one or more A plurality of processors execute such that one or more processors implement a method as described in any implementation of the first aspect.
  • embodiments of the present disclosure provide a computer-readable medium on which a computer program is stored, and when the program is executed by a processor, implements the method described in any implementation manner of the first aspect.
  • an embodiment of the present disclosure provides a computer program product, including a computer program, which, when executed by a processor, implements the method described in any implementation manner of the first aspect.
  • the method and device for determining the drug code provided by the embodiments of the present disclosure: first, the instruction text of the drug is obtained; secondly, the key information of the drug in the instruction text is extracted; At least one code related to the information and the component corresponding to each code; finally, based on the key information of the drug and the component corresponding to each code, the at least one code is screened to obtain the anatomical therapeutic and chemical classification system code of the drug.
  • ATC coding can be automatically performed on drugs through the pre-created coding inverted index according to the instruction text of the drug, which solves the problems faced by the majority of pharmacists in their work, and provides basic coding information for the medical information system.
  • FIG. 1 is an exemplary system architecture diagram to which an embodiment of the present disclosure may be applied;
  • FIG. 2 is a flowchart of one embodiment of a method for determining a drug code according to the present disclosure
  • FIG. 3 is a flow diagram of one embodiment of a method of obtaining an anatomical therapeutic and chemical classification system code for a drug product according to the present disclosure
  • FIG. 4 is a flowchart of another embodiment of a method of obtaining an anatomical therapeutic and chemical classification system code for a drug product according to the present disclosure
  • FIG. 5 is a schematic structural diagram of an embodiment of an apparatus for determining a drug code according to the present disclosure
  • FIG. 6 is a schematic structural diagram of an electronic device suitable for implementing embodiments of the present disclosure.
  • FIG. 1 illustrates an exemplary system architecture 100 to which the method of determining a drug code of the present disclosure may be applied.
  • the system architecture 100 may include terminal devices 101 , 102 , and 103 , a network 104 and a server 105 .
  • the network 104 is a medium used to provide a communication link between the terminal devices 101 , 102 , 103 and the server 105 .
  • the network 104 may include various connection types, and may typically include wireless communication links and the like.
  • the terminal devices 101, 102, and 103 interact with the server 105 through the network 104 to receive or send messages and the like.
  • Various communication client applications may be installed on the terminal devices 101 , 102 and 103 , such as instant messaging tools, email clients, and the like.
  • the terminal devices 101, 102, and 103 may be hardware or software; when the terminal devices 101, 102, and 103 are hardware, they may be user devices with communication and control functions, which can communicate with the server 105.
  • the terminal devices 101, 102, and 103 are software, they can be installed in the above-mentioned user equipment; the terminal devices 101, 102, and 103 can be implemented into multiple software or software modules (for example, software or software modules for providing distributed services) , can also be implemented as a single software or software module. There is no specific limitation here.
  • the server 105 may be a server that provides various services, for example, a background server that provides support for the drug processing system on the terminal devices 101 , 102 , and 103 to determine drug codes.
  • the backend server can analyze and process the instruction text of the medicine in the network, and feed back the processing result (such as the determined ATC code) to the terminal device.
  • the server may be hardware or software.
  • the server can be implemented as a distributed server cluster composed of multiple servers, or can be implemented as a single server.
  • the server is software, it can be implemented as a plurality of software or software modules (for example, software or software modules for providing distributed services), or can be implemented as a single software or software module. There is no specific limitation here.
  • the method for determining the drug code provided by the embodiments of the present disclosure is generally executed by the server 105 .
  • terminal devices, networks and servers in FIG. 1 are merely illustrative. There can be any number of terminal devices, networks and servers according to implementation needs.
  • a flow 200 of an embodiment of a method for determining a drug code according to the present disclosure is shown, and the method for determining a drug code includes the following steps:
  • Step 201 obtaining the instruction text of the medicine.
  • the instruction manual of a drug refers to a legal document indicating important information of the drug, and is a legal guide for selecting a drug. Accurately reading and understanding the instruction manual before taking medicine is a prerequisite for safe drug use.
  • the instructions of the drug include the name, specification, manufacturer, validity period, usage, dosage, drug ingredients, indications or functions, contraindications, adverse reactions and precautions of the drug.
  • the name of the drug includes: generic name, trade name, English name, chemical name, etc. As long as the user knows the generic name of the drug, the user can avoid repeated use of the drug.
  • the instruction text of a drug is a text for indicating the contents of the instruction manual of a drug.
  • the execution subject on which the method for determining the drug code runs may obtain the instruction text through various means, for example, obtain the instruction text from the terminal in real time, or read the instruction text from the memory, which is not limited in this embodiment.
  • Step 202 extracting the key information of the drug in the instruction text.
  • the key information of the drug includes the drug ingredients or information related to the drug ingredients, and the information related to the drug ingredients includes: the name of the drug, the indications or functions of the drug, contraindications, adverse reactions, etc.
  • the drug component may also be the main component of the drug.
  • Natural language processing technology has been widely used in life scenarios that require semantic understanding.
  • entity recognition technology can identify entities (such as drug names, disease names, treatment methods, etc.) in a piece of text, so that content such as diagnosis and prescription in doctor's orders can be automatically analyzed, and medical treatment can be carried out in a structured way.
  • Information management For example, text classification technology can be applied to intelligent triage scenarios, intelligently parse the patient's condition description, accurately match the clinic based on the condition description information, and improve the efficiency of triage.
  • the combination of natural language processing technology and medical scenarios can improve the intelligence of medical scenarios and provide users with a better experience.
  • the ingredients of the drug, the name of the drug, the indications or functions of the drug, contraindications, adverse reactions, etc. in the instruction text can be extracted through natural language processing.
  • the drug ingredients are generally included in the natural language of a short text description, for example: this product is a compound preparation, containing 10 mg of clindamycin hydrochloride (calculated as clindamycin) per milliliter, 8 of metronidazole mg. Excipients are: glycerol, ethanol.
  • a natural language processing model such as a named entity recognition model
  • the main components non-auxiliary components or excipients
  • the drug components extracted by the natural language processing model include: clindamycin hydrochloride , metronidazole, glycerol, ethanol.
  • a natural language model composed of BERT (Bidirectional Encoder Representation from Transformers, based on multi-layer bidirectional conversion and decoding) + CRF (conditional random field, conditional random field) can be used for training, and the trained entity can be identified for drug components.
  • named entity recognition model The key information of the drug is obtained through the named entity recognition model.
  • the accuracy rate of the recognition result of the named entity recognition model can be close to 90%, which can fully meet the requirements of actual clinical use.
  • medicines Based on the different nature and characteristics of medicines, medicines include compound medicines and single prescription medicines.
  • a single prescription drug refers to a single drug preparation, and a single prescription drug mainly contains one drug ingredient.
  • Compound medicine refers to the mixed preparation of two or more kinds of medicines, which can be Chinese medicine, Western medicine or a mixture of Chinese and Western medicines.
  • Combination drugs contain two or more drug ingredients.
  • the medicine components in the key information of medicines may refer to one type or multiple types.
  • Step 203 Obtain at least one code related to the key drug information and components corresponding to each code based on the pre-created code inverted index.
  • the coded inverted index is an index library created before extracting the key drug information in the instruction text, and the coded inverted index only needs to be created once and can be used repeatedly.
  • the created coding inverted index is determined based on the coding of the drug that needs to be determined.
  • the method for determining the coding of a drug provided in this application is used to determine the ATC code of a drug. Therefore, the coding inverted index can be based on the code of the World Health Organization.
  • the defined ATC coding classification standard classification information (ATC Chinese name, ATC English name, ATC code) is indexed by group, for example, a coding inverted index is shown in Table 1.
  • Table 1 includes the ATC code and the Chinese name and English name of the chemical substance corresponding to the ATC code, wherein the chemical substance is also a pharmaceutical ingredient, that is, the pharmaceutical ingredient corresponding to the ATC code.
  • the English name corresponding to "ticlatone” is "ticlatone”
  • the corresponding ATC code is "D01AE08".
  • the corresponding ATC codes for multiple drug components must be multiple, and for one drug component, the corresponding ATC codes can be multiple.
  • the corresponding ATC codes of the drugs containing "tegafur" may include: “L01BC03" and "L01BC53".
  • search engine software can be used to complete the indexing of ATC codes defined by the World Health Organization.
  • search engine software eg, Elasticsearch
  • By establishing an inverted index search engine when looking for the corresponding text field, such as looking for the classification information that contains a certain field (such as metronidazole) in the Chinese name of ATC, it is easy to find all the "metronidazole" appearing in the Chinese name. All the ATC codes of ” are found, for example, the results found are: metronidazole, A01AB17; lansoprazole, amoxicillin and metronidazole, A02BD03, etc. Therefore, the code and the components corresponding to the code can be easily obtained through the search engine software.
  • each medicine component can be set to the maximum return code or the number of components corresponding to the code is n ( n>1), for example, n is set to 10.
  • Step 204 Screen at least one code based on the key information of the drug and the components corresponding to each code to obtain the anatomical therapeutic and chemical classification system code of the drug.
  • the at least one code may be one or more than one
  • the number of the at least one code may be detected first.
  • the obtained code is the ATC code.
  • the above-mentioned key information based on the drug and the components corresponding to each code Screening at least one code to obtain an anatomical therapeutic and chemical classification system code of the drug, including: for each code in the at least one code, detecting whether the component corresponding to the code matches the component of the drug; in response to determining that the code corresponds to The composition of the drug matches the composition of the drug and all codes have been detected, and a primary screening candidate code including the code is obtained; in response to detecting that there is only one code for the primary screening candidate code, it is determined that the primary screening candidate code is the anatomy, therapy and chemistry of the drug. Classification system code.
  • the drug components may be expressed in different languages.
  • it can be detected by the similarity of the content (Chinese name or English word) of the two; or it can be determined by the applicable treatment disease of the two.
  • the drug component and the component corresponding to the code can be used. Treating two or more of the same disease to determine a match.
  • other methods may also be used to detect whether the components of the drug match the components corresponding to the codes, which are not limited.
  • the candidate codes for preliminary screening include all codes in at least one code that match all drug components of the drug, that is, at least one of the codes, and the components corresponding to the codes match the components of the drug.
  • a preliminary screening candidate code including the code is obtained. After all codes of at least one code are detected, the number of codes in the preliminary screening candidate codes is determined. And when the code is only one, get the ATC code of the drug. Therefore, the primary screening candidate code can be obtained only by matching the drug composition with the inverted index result, which is simple to implement and convenient to operate.
  • the key drug information further includes drug indications; based on the key drug information and the components corresponding to each code, at least one code is screened to obtain an anatomical therapeutic and chemical classification system for the drug
  • the coding also includes: in response to detecting that the primary screening candidate codes are multiple codes, determining the disease type corresponding to the drug based on the indication of the drug; and selecting the coding corresponding to the disease type from the primary screening candidate codes as the anatomical treatment of the drug Science and chemical classification system code.
  • the method for determining the drug code provided by the embodiments of the present disclosure: first, obtain the instruction text of the drug; secondly, extract the key information of the drug in the instruction text; then, based on the pre-created coding inverted index, obtain the key information related to the drug At least one code and the component corresponding to each code; finally, based on the key information of the drug and the component corresponding to each code, the at least one code is screened to obtain the anatomical therapeutic and chemical classification system code of the drug.
  • ATC coding can be automatically performed on drugs through the pre-created coding inverted index according to the instruction text of the drug, which solves the problems faced by the majority of pharmacists in their work, and provides basic coding information for the medical information system.
  • FIG. 3 shows an embodiment of a method for obtaining an anatomical therapeutic and chemical classification system code for a drug according to the present disclosure
  • the process 300, the method for obtaining the anatomical therapeutic and chemical classification system code of the medicine includes the following steps:
  • Step 301 for each encoding in at least one encoding, detect whether the component corresponding to the encoding satisfies one of the multiple rules with priority order; if the component corresponding to the encoding satisfies one of the multiple rules with priority order One, step 302 is executed.
  • multiple rules are determined based on drug components, and after a component corresponding to the code satisfies any one of the multiple rules according to the priority order of the rules, other rules in the multiple rules may not be considered.
  • the multiple rules are sorted in descending order of priority as follows: 1) When there are two or more drug components, the components corresponding to the code include all drug components of the drug; 2) When there are two or more drug components, The component corresponding to the code includes at least one drug component of the drug and contains the word compound; 3) When there are two or more drug components, the component corresponding to the code includes at least one drug component of the drug and does not contain the word compound; 4) The drug component When there is one, the component corresponding to the code includes a pharmaceutical component.
  • the content, priority order, and number of each of the above-mentioned multiple rules can be adaptively adjusted based on the drug components in the instruction text of the drug.
  • a plurality of rules may only include the above 1) and 4).
  • a plurality of rules may only include the above 1)-3).
  • multiple rules with a priority order can be applied to unilateral medicines and compound medicines, and compound medicines are taken as the priority object, which improves the reliability and comprehensiveness of component investigation corresponding to codes.
  • Step 302 check whether all the codes in the at least one code have been detected; if so, go to step 303 ; if the at least one code has not been detected, return to step 301 .
  • the code is each code arranged in sequence in the at least one code, and is also the current code.
  • the current code (the code) satisfies one of the plurality of rules, it will be put into the primary screening candidate code. If in step 302, the current encoding does not satisfy any one of the multiple rules, then abandon the encoding, return to step 301, and re-detect the adjacent encoding after the current encoding in at least one encoding as the current encoding.
  • Step 303 determine the primary screening candidate codes including the code, and then perform step 304 .
  • the primary screening candidate code is the ATC code obtained for the first time that meets the requirements of the drug instruction text, and the components corresponding to each code in the primary screening candidate code satisfy one of the multiple rules with priority order, and the primary screening candidate code can be There is only one encoding, and there can be multiple encodings.
  • the candidate codes for preliminary screening include all codes in the at least one code that satisfy one of the plurality of rules, that is, at least one code, and whether the component corresponding to the code satisfies the plurality of rules with priority order one of the.
  • Step 304 check whether there is only one code for the primary screening candidate code; if the detection result is that there is only one code, step 305 is executed.
  • the detection result is that there is only one code, it is determined that the current primary screening candidate code is the ATC code of the drug, and no subsequent detection is required.
  • the codes in the candidate codes for the preliminary screening may be subjected to similarity matching, and a plurality of initial screening candidate codes with the most similarity may be matched.
  • One of the screening candidate codes is used as the ATC code of the drug.
  • the code with the word compound in the corresponding component of the preliminary screening candidate code among all the preliminary screening candidate codes may be used as the ATC code of the drug.
  • the code of the corresponding component of the preliminary screening candidate code that does not have the word compound may be used as the ATC code of the drug.
  • Step 305 determining that the primary screening candidate code is the anatomical therapeutic and chemical classification system code of the drug.
  • the anatomical therapeutic and chemical classification system codes of the drugs are determined based on multiple rules corresponding to the drug components, which improves the reliability of the determination of the ATC codes.
  • a method for obtaining the anatomical therapeutic and chemical classification system code of a drug according to the present disclosure is shown Process 400 of another embodiment of .
  • the method for obtaining an anatomical therapeutic and chemical classification system code for a medicinal product includes the following steps:
  • Step 401 for each code in the at least one code, detect whether the component corresponding to the code satisfies one of a plurality of rules with a priority order. If the component corresponding to the code satisfies one of the multiple rules with priority order, step 402 is executed.
  • Step 402 check whether all the codes in the at least one code have been detected; if so, go to step 403 ; if the at least one code has not been detected, return to step 401 .
  • Step 403 determine the primary screening candidate codes including the code, and then execute step 404 .
  • Step 404 detecting whether there is only one code in the candidate code for preliminary screening. If the detection result is that there is only one code, step 405 is executed. If the detection result is that the primary screening candidate codes are multiple codes, step 406 is executed.
  • Step 405 determining that the primary screening candidate code is the anatomical therapeutic and chemical classification system code of the drug.
  • Step 406 based on the indications of the drug, determine the disease type corresponding to the drug, and then perform step 407 .
  • a table of correspondence between indications and disease types may be preset, and after obtaining the indications for drugs, based on the pre-set correspondence table between indications and disease types, one can quickly obtain the correspondence between indications and disease types. the corresponding disease type.
  • determining the disease type corresponding to the drug based on the drug indication includes: using a pre-trained classification model to classify the indication, and obtain the disease type output by the classification model.
  • the BERT model can be used to build a classification model, so that the classification model can classify the indications in the drug instructions text, and obtain the probability values of different disease types output by the model, for example, to classify 14 disease types.
  • the indications in the instructions include "for acne vulgaris, but also for seborrheic dermatitis, rosacea, and folliculitis", and classify 14 disease types to determine which type of disease the drug belongs to.
  • These categories are digestive system, metabolic system, blood and hematopoietic organs, cardiovascular system, dermatology, genitourinary system, sex hormones, anti-infective, anti-tumor and immunological drugs, musculoskeletal system, nervous system, anti-parasitic system, respiratory system, Sensory system, a total of 14 disease types. These 14 classifications also correspond to the 14 disease types in the ATC primary classification.
  • the classification model outputs the respective confidence scores for the above 14 disease types.
  • the classification model output classification scores are: digestive system (2%), metabolic system (7%), blood and hematopoietic organs (8%), cardiovascular system (5%), skin disease (80%) %), genitourinary system (1%), sex hormones (8%), anti-infection (10%), anti-tumor and immunological drugs (2%), musculoskeletal system (2%), nervous system (2%), anti-inflammatory drugs Parasites (2%), respiratory system (2%), sensory system (2%), the result is that the disease types for the above indications are skin diseases.
  • the classification model By training a classification model corresponding to the type of disease and the indication of the drug, it is possible to know which disease the drug is used to treat through the description of the indication text in the drug instruction manual.
  • the classification model is used for classification.
  • the classification accuracy can reach more than 93%.
  • the indications extracted from the instruction text are input into the pre-trained classification model, and the disease types output by the classification model can be obtained. Further, by comparing the obtained disease type with the disease type corresponding to each code in the preliminary screening candidate code, the preferred ATC code corresponding to the drug in the preliminary screening candidate code can be obtained.
  • the classification model can improve the accuracy of disease type acquisition and ensure the reliability of ATC coding of drugs.
  • step 407 the code corresponding to the disease type is selected from the preliminary screening candidate codes as the code of the anatomical therapeutic and chemical classification system of the drug.
  • the disease type may be one or multiple; when the disease type is one, the primary screening candidate code corresponding to the disease type is the ATC code of the drug.
  • the primary screening candidate code corresponding to the most disease type in the disease type may be used as the ATC code of the drug.
  • the primary screening candidate codes in at least one code are determined based on the drug components, and when the primary screening candidate codes are multiple codes, the primary screening candidate codes are determined based on the drug indications
  • the ATC code of the drug is determined from the primary screening candidate code, which solves the problem that the same drug has multiple ATC codes and ensures the accuracy of the determination of the ATC code.
  • the present disclosure provides an embodiment of an apparatus for determining a drug code.
  • This apparatus embodiment corresponds to the method embodiment shown in FIG. 2 , and the apparatus can be specifically applied to various electronic devices.
  • an embodiment of the present disclosure provides an apparatus 500 for determining a drug code.
  • the apparatus 500 includes: an acquiring unit 501 , an extracting unit 502 , a obtaining unit 503 , and a screening unit 504 .
  • the obtaining unit 501 may be configured to obtain the instruction text of the medicine.
  • the extraction unit 502 may be configured to extract the key information of the medicine in the instruction text.
  • the obtaining unit 503 may be configured to obtain at least one code related to the key drug information and components corresponding to each code based on a pre-created code inverted index.
  • the screening unit 504 may be configured to screen at least one code based on the key information of the drug and the components corresponding to each code to obtain the anatomical therapeutic and chemical classification system code of the drug.
  • the specific processing of the acquiring unit 501, the extracting unit 502, the obtaining unit 503, and the screening unit 504 and the technical effects brought about by the acquiring unit 501, the extracting unit 502, and the screening unit 504, and the technical effects brought about by them may refer to the corresponding embodiments in FIG. 2, respectively.
  • the above-mentioned key drug information includes drug components
  • the above-mentioned screening unit 504 includes: a detection module (not shown in the figure), a preliminary screening module (not shown in the figure), and a determination module (not shown in the figure) .
  • the detection module may be configured to, for each of the at least one code, detect whether the component corresponding to the code satisfies one of a plurality of rules with a priority order, and the plurality of rules are determined based on the drug components.
  • the preliminary screening module may be configured to determine a preliminary screening candidate code including the code in response to determining that the component corresponding to the code satisfies one of the plurality of rules and all codes are detected.
  • the determining module may be configured to, in response to detecting that the primary screening candidate coding is only one coding, determine that the primary screening candidate coding is the anatomical therapeutic and chemical classification system coding of the drug product.
  • the above-mentioned multiple rules are ordered from high to low priority as follows: 1) when there are two or more drug components, the components corresponding to the code include all the drug components of the drug; 2) the drug components have two or above, the component corresponding to the code includes at least one drug component of the drug and contains the word compound; 3) When there are two or more drug components, the component corresponding to the code includes at least one drug component of the drug and does not contain the word compound; 4) When there is one drug component, the component corresponding to the code includes the drug component.
  • the above-mentioned drug key information includes drug components
  • the above-mentioned screening unit 504 includes: a matching module (not shown in the figure), a response module (not shown in the figure), and an encoding module (not shown in the figure).
  • the matching module may be configured to, for each code in the at least one code, detect whether the component corresponding to the code matches the drug component.
  • the response module may be configured to obtain a preliminary screening candidate code including the code in response to determining that the component corresponding to the code matches the drug component and all codes are detected.
  • the encoding module may be configured to, in response to detecting that there is only one encoding of the preliminary screening candidate encoding, determine that the preliminary screening candidate encoding is the anatomical therapeutic and chemical classification system encoding of the drug product.
  • the above-mentioned key drug information further includes: drug indications;
  • the above-mentioned screening unit 504 includes: a classification module (not shown in the figure) and a confirmation module (not shown in the figure).
  • the classification module may be configured to, in response to detecting that the primary screening candidate codes are multiple codes, determine the disease type corresponding to the drug based on the indication of the drug.
  • the above-mentioned confirmation module may be configured to screen out the codes corresponding to the disease types from the preliminary screening candidate codes as the codes of the anatomical therapy and chemical classification system of the medicine.
  • the above-mentioned classification module is further configured to use the pre-trained classification model to classify the indications, and obtain the disease types output by the classification model.
  • the obtaining unit 501 obtains the instruction text of the drug; secondly, the extracting unit 502 extracts the key information of the drug in the instruction text; then, the obtaining unit 503 retrieves the code based on the pre-created code Arrange the index to obtain at least one code related to the key information of the drug and components corresponding to each code; finally, the screening unit 504 screens the at least one code based on the key information of the drug and the components corresponding to each code to obtain the anatomical treatment of the drug Science and chemical classification system code.
  • ATC coding can be automatically performed on drugs through the pre-created coding inverted index according to the instruction text of the drug, which solves the problems faced by the majority of pharmacists in their work, and provides basic coding information for the medical information system.
  • FIG. 6 a schematic structural diagram of an electronic device 600 suitable for implementing embodiments of the present disclosure is shown.
  • an electronic device 600 may include a processing device (eg, a central processing unit, a graphics processor, etc.) 601 that may be loaded into random access according to a program stored in a read only memory (ROM) 602 or from a storage device 608 Various appropriate actions and processes are executed by the programs in the memory (RAM) 603 . In the RAM 603, various programs and data required for the operation of the electronic device 600 are also stored.
  • the processing device 601, the ROM 602, and the RAM 603 are connected to each other through a bus 604.
  • An input/output (I/O) interface 605 is also connected to bus 604 .
  • the following devices can be connected to the I/O interface 605: input devices 606 including, for example, a touch screen, touchpad, keyboard, mouse, etc.; output devices 607 including, for example, a Liquid Crystal Display (LCD), speakers, vibrators, etc. ; including storage devices 608 such as magnetic tapes, hard disks, etc.; and communication devices 609 .
  • Communication means 609 may allow electronic device 600 to communicate wirelessly or by wire with other devices to exchange data. While FIG. 6 shows electronic device 600 having various means, it should be understood that not all of the illustrated means are required to be implemented or provided. More or fewer devices may alternatively be implemented or provided. Each block shown in FIG. 6 may represent one device, or may represent multiple devices as required.
  • embodiments of the present disclosure include a computer program product comprising a computer program carried on a computer-readable medium, the computer program containing program code for performing the method illustrated in the flowchart.
  • the computer program may be downloaded and installed from the network via the communication device 609 , or from the storage device 608 , or from the ROM 602 .
  • the processing apparatus 601 the above-described functions defined in the methods of the embodiments of the present disclosure are executed.
  • the computer-readable medium of the embodiments of the present disclosure may be a computer-readable signal medium or a computer-readable storage medium, or any combination of the above two.
  • the computer-readable storage medium can be, for example, but not limited to, an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus or device, or a combination of any of the above. More specific examples of computer readable storage media may include, but are not limited to, electrical connections with one or more wires, portable computer disks, hard disks, random access memory (RAM), read only memory (ROM), erasable Programmable read only memory (EPROM or flash memory), fiber optics, portable compact disk read only memory (CD-ROM), optical storage devices, magnetic storage devices, or any suitable combination of the foregoing.
  • a computer-readable storage medium may be any tangible medium that contains or stores a program that can be used by or in conjunction with an instruction execution system, apparatus, or device.
  • a computer-readable signal medium may include a data signal in baseband or propagated as part of a carrier wave, carrying computer-readable program code therein. Such propagated data signals may take a variety of forms, including but not limited to electromagnetic signals, optical signals, or any suitable combination of the foregoing.
  • a computer-readable signal medium can also be any computer-readable medium other than a computer-readable storage medium that can transmit, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device .
  • the program code contained on the computer-readable medium can be transmitted by any suitable medium, including but not limited to: electric wire, optical cable, RF (Radio Frequency, radio frequency), etc., or any suitable combination of the above.
  • the above-mentioned computer-readable medium may be included in the above-mentioned server; or may exist alone without being assembled into the server.
  • the above-mentioned computer-readable medium carries one or more programs, and when the above-mentioned one or more programs are executed by the server, the server can: obtain the instruction text of the drug; extract the key information of the drug in the instruction text; based on the pre-created code Inverted index to obtain at least one code related to the key information of the drug and components corresponding to each code; based on the key information of the drug and the components corresponding to each code, screen at least one code to obtain the anatomical therapeutic and chemical classification of the drug System code.
  • Computer program code for carrying out operations of embodiments of the present disclosure may be written in one or more programming languages, including object-oriented programming languages—such as Java, Smalltalk, C++, and also A conventional procedural programming language - such as the "C" language or similar programming language.
  • the program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer, or entirely on the remote computer or server.
  • the remote computer may be connected to the user's computer through any kind of network, including a local area network (LAN) or a wide area network (WAN), or may be connected to an external computer (eg, using an Internet service provider through Internet connection).
  • LAN local area network
  • WAN wide area network
  • each block in the flowchart or block diagrams may represent a module, segment, or portion of code that contains one or more logical functions for implementing the specified functions executable instructions.
  • the functions noted in the blocks may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved.
  • each block of the block diagrams and/or flowchart illustrations, and combinations of blocks in the block diagrams and/or flowchart illustrations can be implemented in dedicated hardware-based systems that perform the specified functions or operations , or can be implemented in a combination of dedicated hardware and computer instructions.
  • the units involved in the embodiments of the present disclosure may be implemented in software or hardware.
  • the described unit can also be set in the processor, for example, it can be described as: a processor including an acquisition unit, an extraction unit, a obtaining unit and a screening unit.
  • the names of these units do not constitute a limitation on the unit itself under certain circumstances, for example, the obtaining unit may also be described as a unit "configured to obtain the instruction text of the medicine".

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Library & Information Science (AREA)
  • Toxicology (AREA)
  • Pharmacology & Pharmacy (AREA)
  • Public Health (AREA)
  • Primary Health Care (AREA)
  • Chemical & Material Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Medicinal Chemistry (AREA)
  • Medical Informatics (AREA)
  • Epidemiology (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Medical Treatment And Welfare Office Work (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

一种确定药品编码的方法和装置,涉及人工智能技术领域。该方法包括:获取药品的说明书文本(201);提取说明书文本中的药品关键信息(202);基于预先创建的编码倒排索引,得到与药品关键信息相关的至少一个编码以及与各个编码对应的成分(203);基于药品关键信息以及各个编码对应的成分,对至少一个编码进行筛选,得到药品的解剖学治疗学及化学分类系统编码(204)。该方法提高了药品查找编码的效率。

Description

确定药品编码的方法、装置、电子设备以及计算机介质
相关申请的交叉引用
本申请要求于2021年1月15日提交的申请号为202110054078.1、发明名称为“确定药品编码的方法、装置、电子设备以及计算机介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本公开涉及计算机技术领域,具体涉及人工智能技术领域,尤其涉及确定药品编码的方法、装置、电子设备、计算机可读介质以及计算机程序产品。
背景技术
解剖学治疗学及化学分类系统,简称ATC(Anatomical Therapeutic Chemical)系统,是世界卫生组织对药品的官方分类系统。随着医疗信息化系统的建设进展,各级医疗结构、医保局和医疗保险机构,逐步建立起以ATC编码体系为基础的药物精确化管理系统。
目前一般通过学习分类算法对药物分子式结构进行预测,得到药物的ATC编码。但是通过分子式结构预测药物的ATC编码,技术复杂且准确率不高,也不适用于新研发药物之外的药物。
发明内容
本公开的实施例提出了确定药品编码的方法、装置、电子设备、计算机可读介质以及计算机程序产品。
第一方面,本公开的实施例提供了一种确定药品编码的方法,该方法包括:获取药品的说明书文本;提取说明书文本中的药品关键信息;基于预先创建的编码倒排索引,得到与药品关键信息相关的至少一个编码以及与各个编码对应的成分;基于药品关键信息以及各个编码对应的成分,对至少一个编码进行筛选,得到药品的解剖学治疗学及化学分类系统编码。
在一些实施例中,上述药品关键信息包括药品成分;,上述基于药品关键信息以及各个编码对应的成分,对至少一个编码进行筛选,得到药品的解剖学治疗学及化学分类系统编码,包括:针对至少一个编码中的每个编码,检 测该编码对应的成分是否满足具有优先级顺序的多个规则中的一个,多个规则基于药品成分确定;响应于确定该编码对应的成分满足多个规则中的一个且所有编码均检测完成,确定包括该编码的初筛候选编码;响应于检测到初筛候选编码只有一个编码,确定初筛候选编码为药品的解剖学治疗学及化学分类系统编码。
在一些实施例中,上述多个规则按从高至低优先级排序如下:1)药品成分具有两种或以上时,该编码对应的成分包括药品的所有药品成分;2)药品成分具有两种或以上时,该编码对应的成分包括药品的至少一个药品成分且含有复方字样;3)药品成分具有两种或以上时,该编码对应的成分包括药品的至少一个药品成分且不含有复方字样;4)药品成分具有一种时,该编码对应的成分包括药品成分。
在一些实施例中,上述药品关键信息包括药品成分,上述基于药品关键信息以及各个编码对应的成分,对至少一个编码进行筛选,得到药品的解剖学治疗学及化学分类系统编码,包括:针对至少一个编码中的每个编码,检测该编码对应的成分是否与药品成分相匹配;响应于确定该编码对应的成分与药品成分相匹配且所有编码均检测完成,得到包括该编码的初筛候选编码;响应于检测到初筛候选编码只有一个编码,确定初筛候选编码为药品的解剖学治疗学及化学分类系统编码。
在一些实施例中,上述药品关键信息还包括药品适应症,上述基于药品关键信息以及各个编码对应的成分,对至少一个编码进行筛选,得到药品的解剖学治疗学及化学分类系统编码,还包括:响应于检测到初筛候选编码为多个编码,基于药品适应症,确定药品对应的疾病类型;从初筛候选编码中筛选出与疾病类型对应的编码作为药品的解剖学治疗学及化学分类系统编码。
在一些实施例中,上述基于药品适应症,确定药品对应的疾病类型,包括:采用预先训练完成的分类模型对适应症进行疾病分类,得到分类模型输出的疾病类型。
第二方面,本公开的实施例提供了一种确定药品编码的装置,该装置包括:获取单元,被配置成获取药品的说明书文本;提取单元,被配置成提取说明书文本中的药品关键信息;得到单元,被配置成基于预先创建的编码倒排索引,得到与药品关键信息相关的至少一个编码以及与各个编码对应的成分;筛选单元,被配置成基于药品关键信息以及各个编码对应的成分,对至 少一个编码进行筛选,得到药品的解剖学治疗学及化学分类系统编码。
在一些实施例中,上述药品关键信息包括药品成分,上述筛选单元包括:检测模块,被配置成针对至少一个编码中的每个编码,检测该编码对应的成分是否满足具有优先级顺序的多个规则中的一个,多个规则基于药品成分确定;初筛模块,被配置成响应于确定该编码对应的成分满足多个规则中的一个且所有编码均检测完成,确定包括该编码的初筛候选编码;确定模块,被配置成响应于检测到初筛候选编码只有一个编码,确定初筛候选编码为药品的解剖学治疗学及化学分类系统编码。
在一些实施例中,上述多个规则按从高至低优先级排序如下:1)药品成分具有两种或以上时,该编码对应的成分包括药品的所有药品成分;2)药品成分具有两种或以上时,该编码对应的成分包括药品的至少一个药品成分且含有复方字样;3)药品成分具有两种或以上时,该编码对应的成分包括药品的至少一个药品成分且不含有复方字样;4)药品成分具有一种时,该编码对应的成分包括药品成分。
在一些实施例中,上述药品关键信息包括药品成分,上述筛选单元包括:匹配模块,被配置成针对至少一个编码中的每个编码,检测该编码对应的成分是否与药品成分相匹配;响应模块,被配置成响应于确定该编码对应的成分与药品成分相匹配且所有编码均检测完成,得到包括该编码的初筛候选编码;编码模块,被配置成响应于检测到初筛候选编码只有一个编码,确定初筛候选编码为药品的解剖学治疗学及化学分类系统编码。
在一些实施例中,上述药品关键信息还包括药品适应症,筛选单元还包括:分类模块,被配置成响应于检测到初筛候选编码为多个编码,基于药品适应症,确定药品对应的疾病类型;确认模块,被配置成从初筛候选编码中筛选出与疾病类型对应的编码作为药品的解剖学治疗学及化学分类系统编码。
在一些实施例中,上述分类模块进一步地被配置成采用预先训练完成的分类模型对适应症进行疾病分类,得到分类模型输出的疾病类型。
第三方面,本公开的实施例提供了一种电子设备,该电子设备包括:一个或多个处理器;存储装置,其上存储有一个或多个程序;当一个或多个程序被一个或多个处理器执行,使得一个或多个处理器实现如第一方面中任一实现方式描述的方法。
第四方面,本公开的实施例提供了一种计算机可读介质,其上存储有计 算机程序,该程序被处理器执行时实现如第一方面中任一实现方式描述的方法。
第五方面,本公开的实施例提供了一种计算机程序产品,包括计算机程序,计算机程序在被处理器执行时实现如第一方面任一实现方式描述的方法。
根据本公开的实施例提供的确定药品编码的方法和装置:首先,获取药品的说明书文本;其次,提取说明书文本中的药品关键信息;然后,基于预先创建的编码倒排索引,得到与药品关键信息相关的至少一个编码以及与各个编码对应的成分;最后,基于药品关键信息以及各个编码对应的成分,对至少一个编码进行筛选,得到药品的解剖学治疗学及化学分类系统编码。由此,能够依据药品的说明书文本,通过预先创建的编码倒排索引,自动化地对药物进行ATC编码,解决广大药师在工作中的难题,为医药的信息化系统提供了编码基础信息。
附图说明
通过阅读参照以下附图所作的对非限制性实施例所作的详细描述,本公开的其它特征、目的和优点将会变得更明显:
图1是本公开的一个实施例可以应用于其中的示例性系统架构图;
图2是根据本公开的确定药品编码的方法的一个实施例的流程图;
图3是根据本公开的得到药品的解剖学治疗学及化学分类系统编码的方法的一个实施例的流程图;
图4是根据本公开的得到药品的解剖学治疗学及化学分类系统编码的方法的另一个实施例的流程图;
图5是根据本公开的确定药品编码的装置的实施例的结构示意图;
图6是适于用来实现本公开的实施例的电子设备的结构示意图。
具体实施方式
下面结合附图和实施例对本公开作进一步的详细说明。可以理解的是,此处所描述的具体实施例仅仅用于解释相关发明,而非对该发明的限定。另外还需要说明的是,为了便于描述,附图中仅示出了与有关发明相关的部分。
需要说明的是,在不冲突的情况下,本公开中的实施例及实施例中的特征可以相互组合。下面将参考附图并结合实施例来详细说明本公开。
图1示出了可以应用本公开的确定药品编码的方法的示例性系统架构100。
如图1所示,系统架构100可以包括终端设备101、102、103,网络104和服务器105。网络104用以在终端设备101、102、103和服务器105之间提供通信链路的介质。网络104可以包括各种连接类型,通常可以包括无线通信链路等等。
终端设备101、102、103通过网络104与服务器105交互,以接收或发送消息等。终端设备101、102、103上可以安装有各种通讯客户端应用,例如即时通信工具、邮箱客户端等。
终端设备101、102、103可以是硬件,也可以是软件;当终端设备101、102、103为硬件时,可以是具有通信和控制功能的用户设备,上述用户设备可与服务器105进行通信。当终端设备101、102、103为软件时,可以安装在上述用户设备中;终端设备101、102、103可以实现成多个软件或软件模块(例如用来提供分布式服务的软件或软件模块),也可以实现成单个软件或软件模块。在此不做具体限定。
服务器105可以是提供各种服务的服务器,例如为终端设备101、102、103上药品处理系统提供支持的确定药品编码的后台服务器。后台服务器可以对网络中药品的说明书文本进行分析处理,并将处理结果(如确定的ATC编码)反馈给终端设备。
需要说明的是,服务器可以是硬件,也可以是软件。当服务器为硬件时,可以实现成多个服务器组成的分布式服务器集群,也可以实现成单个服务器。当服务器为软件时,可以实现成多个软件或软件模块(例如用来提供分布式服务的软件或软件模块),也可以实现成单个软件或软件模块。在此不做具体限定。
需要说明的是,本公开的实施例所提供的确定药品编码的方法一般由服务器105执行。
应该理解,图1中的终端设备、网络和服务器的数目仅仅是示意性的。根据实现需要,可以具有任意数目的终端设备、网络和服务器。
在本实施例的一些可选实现方式中,如图2,示出了根据本公开的确定药品编码的方法的一个实施例的流程200,该确定药品编码的方法包括以下 步骤:
步骤201,获取药品的说明书文本。
本实施例中,药品的说明书是指载明药品的重要信息的法定文件,是选用药品的法定指南,用药前准确阅读和理解说明书是安全用药的前提。药品的说明书包括药品的名称、规格、生产企业、有效期、用法、用量、药品成分、适应症或功能主治、禁忌、不良反应和注意事项。其中,药品的品名包括:通用名、商品名、英文名、化学名等。使用者一般只要能清楚药品的通用名,就能避免重复用药。药品的说明书文本是用于表示药品的说明书的内容的文本。
确定药品编码的方法运行于其上的执行主体可以通过各种手段获得说明书文本,比如,实时从终端获取说明书文本,或者从内存中读取说明书文本,对此本实施例不做限定。
步骤202,提取说明书文本中的药品关键信息。
本实施例中,通过获取药品的说明书文本,可以对说明书文本进行自然语言处理,得到药品的关键信息。药品的关键信息包括药品成分或与药品成分相关的信息,而与药品成分相关的信息包括:药品的品名、药品的适应症或功能主治、禁忌、不良反应等。本实施例中,药品成分也可以是药品的主要的成分。
自然语言处理技术目前已经广泛应用于生活中需要语义理解的场景。如实体识别技术,可以将一段文本中的实体(如药物名称、疾病名称、治疗方法等)识别出来,这样就可以做到自动分析诸如医生医嘱中诊断、处方等内容,以结构化方式进行医疗信息管理。如文本分类技术,可以应用于智能分诊场景,智能解析患者病情描述,基于病情描述信息,精准匹配诊室,提高分诊效率。自然语言处理技术与医疗场景相结合,能够提高医疗场景的智能化,为用户提供更好体验。
本实施例中,通过自然语言处理可以提取说明书文本中的药品成分、药品的品名、药品的适应症或功能主治、禁忌、不良反应等。在药品说明书中,药品成分一般包括在一小段文本描述的自然语言中,例如:本品为复方制剂,每毫升含盐酸克林霉素(以克林霉素计)10毫克,甲硝唑8毫克。辅料为:甘油、乙醇。借助自然语言处理模型(例如命名实体识别模型),可以对其中的主要成分(非辅助成分或辅料)进行提取,针对上述说明书文本,通过自 然语言处理模型提取的药品成分包括:盐酸克林霉素、甲硝唑、甘油、乙醇。
可选地,可以采用BERT(Bidirectional Encoder Representation from Transformers,基于多层双向转换解码)+CRF(conditional random field,条件随机场)组成的自然语言模型进行训练,得到训练完成的针对药品成分进行实体识别的命名实体识别模型。通过该命名实体识别模型得到药品关键信息。在实际操作中,该命名实体识别模型的识别结果的准确率可以接近90%,完全可以满足实际临床的使用要求。
基于药品的性质、特点不同,药品包括复方药和单方药。单方药是指单味药制剂,单方药主要包含一种药品成分。复方药是指两种或两种以上的药物混合制剂,可以是中药、西药或中西药混合。复方药包含两种或以上药品成分。本实施例中,针对上述不同种类的药品,药品关键信息中的药品成分可以指一种,也可以是多种。
步骤203,基于预先创建的编码倒排索引,得到与药品关键信息相关的至少一个编码以及与各个编码对应的成分。
本实施例中,编码倒排索引是在提取说明书文本中的药品关键信息之前创建的索引库,该编码倒排索引只需创建一次,便可以重复使用。
本实施例中,创建的编码倒排索引基于需要确定的药品的编码确定,本申请提供的确定药品的编码的方法用于确定药品的ATC编码,因此编码倒排索引可以是以世界卫生组织所定义的ATC编码分类标准分类信息(ATC中文名称、ATC英文名称、ATC编码),按组进行倒排索引,例如一种编码倒排索引如表1所示。在表1中包括ATC编码、ATC编码对应的化学物质的中文名称、英文名称,其中,化学物质也是药品成分,即ATC编码对应的药品成分。例如,“替克拉酮”对应的英文名称是“ticlatone”,对应的ATC编码为“D01AE08”。
需要说明的是,由于药品成分可以是一种也可以是多种,针对多种药品成分,其对应的ATC编码一定是多个,而针对一种药品成分,其对应的ATC编码可以是多个,例如,表1中,含“替加氟”的药物的ATC编码可以对应的ATC编码包括:“L01BC03”、“L01BC53”。
表1
中文名称 英文名称 ATC编码
替克拉酮 ticlatone D01AE08
替克洛可 teclozan P01AC04
替利定 tilidine N02AX01
替加氟(喃氟啶) tegafur L01BC03
替加氟,复方 tegafur,combinations L01BC53
在实际应用场景中,可以使用搜索引擎软件(例如Elasticsearch)完成世界卫生组织所定义的ATC编码的索引。通过建立倒排索引搜索引擎,在查找相应的文本字段时,如查找ATC中文名称中含有某字段(如甲硝唑)的分类信息,就可以很容易将所有在中文名称中出现“甲硝唑”的ATC编码全部查找出来,如查找到的结果为:甲硝唑,A01AB17;兰索拉唑,阿莫西林和甲硝唑,A02BD03等。因此通过搜索引擎软件可以很容易得到编码以及与编码对应的成分。
进一步地,还可以设置搜索引擎软件返回ATC编码的数量。将步骤202得到的药品关键信息中的所有药品成分放入编码倒排索引中找到与之相关的编码或编码对应的成分,每种药品成份可以设置最大返回编码或编码对应的成分数目为n(n>1),例如n设置为10。
步骤204,基于药品关键信息以及各个编码对应的成分,对至少一个编码进行筛选,得到药品的解剖学治疗学及化学分类系统编码。
本实施例中,可选地,由于至少一个编码可以是一个,也可以是多个,所以在得到至少一个编码之后,可以首先检测该至少一个编码的数量。当至少一个编码为一个时,则得到的该编码即为ATC编码。当至少一个编码为多于一个的编码时,则需要基于药品关键信息以及各个编码对应的成分,对至少一个编码进行筛选,得到药品的ATC编码。
由于直接通过倒排索引的检索得到的编码不一定完全满足药品关键信息中药品成分的要求,所以在本实施例的一些可选实现方式中,上述基于药品关键信息以及各个编码对应的成分,对至少一个编码进行筛选,得到药品的解剖学治疗学及化学分类系统编码,包括:针对至少一个编码中的每个编码,检测该编码对应的成分是否与药品成分相匹配;响应于确定该编码对应的成 分与药品成分相匹配且所有编码均检测完成,得到包括该编码的初筛候选编码;响应于检测到初筛候选编码只有一个编码,确定初筛候选编码为药品的解剖学治疗学及化学分类系统编码。
本可选实现方式中,药品成分可以是采用不同语言表示。检测药品成分与编码对应的成分是否匹配,可以通过两者内容(中文名称或英文单词)的相似度检测;或者可以通过两者的适用的治疗疾病确定,比如药品成分与编码对应的成分均可以治疗两种以上的相同的疾病,确定两者匹配。当然还可以采用其他方式检测药品成分与编码对应的成分是否匹配,对此不做限定。
本实施例中,初筛候选编码包括至少一个编码中与药品的所有药品成分匹配的所有编码,即至少一个该编码,而该编码对应的成分与药品成分相匹配。
本可选实现方式中,通过将药品的关键信息中的药品成分与至少一个编码中每个编码对应的成分进行匹配,在满足匹配条件时,得到包括该编码的初筛候选编码。在至少一个编码的所有编码均检测完成之后,确定初筛候选编码中编码的数量。而在编码只是一个时,得到药品的ATC编码。由此,仅通过与倒排索引结果进行药品成分匹配,便可以得到初筛候选编码,实现简单,操作方便。
在本实施例另一些可选实现方式中,药品关键信息还包括药品适应症;基于药品关键信息以及各个编码对应的成分,对至少一个编码进行筛选,得到药品的解剖学治疗学及化学分类系统编码,还包括:响应于检测到初筛候选编码为多个编码,基于药品适应症,确定药品对应的疾病类型;从初筛候选编码中筛选出与疾病类型对应的编码作为药品的解剖学治疗学及化学分类系统编码。
根据本公开的实施例提供的确定药品编码的方法:首先,获取药品的说明书文本;其次,提取说明书文本中的药品关键信息;然后,基于预先创建的编码倒排索引,得到与药品关键信息相关的至少一个编码以及与各个编码对应的成分;最后,基于药品关键信息以及各个编码对应的成分,对至少一个编码进行筛选,得到药品的解剖学治疗学及化学分类系统编码。由此,能够依据药品的说明书文本,通过预先创建的编码倒排索引,自动化地对药物进行ATC编码,解决广大药师在工作中的难题,为医药的信息化系统提供了编码基础信息。
在药品关键信息包括药品成分时,在本实施例的一些可选实现方式中,如图3,示出了根据本公开的得到药品的解剖学治疗学及化学分类系统编码的方法的一个实施例的流程300,该得到药品的解剖学治疗学及化学分类系统编码的方法包括以下步骤:
步骤301,针对至少一个编码中的每个编码,检测该编码对应的成分是否满足具有优先级顺序的多个规则中的一个;若该编码对应的成分满足具有优先级顺序的多个规则中的一个,执行步骤302。
本实施例中,多个规则基于药品成分确定,在编码对应的成分按规则的优先级顺序满足多个规则中的任一个之后,多个规则中其他的规则可不用再考虑。
具体地,多个规则按从高至低优先级排序如下:1)药品成分具有两种或以上时,该编码对应的成分包括药品的所有药品成分;2)药品成分具有两种或以上时,该编码对应的成分包括药品的至少一个药品成分且含有复方字样;3)药品成分具有两种或以上时,该编码对应的成分包括药品的至少一个药品成分且不含有复方字样;4)药品成分具有一种时,该编码对应的成分包括药品成分。
需要说明的是,上述多个规则中的各个规则的内容、优先级顺序以及数目可以基于药品的说明书文本中的药品成分进行适应性调节。例如,针对说明书文本为单方药品的说明书文本,多个规则可以仅包括上述1)与4)。例如,针对说明书文本为复方药品的说明书文本,多个规则可以仅包括上述1)-3)。
本可选实现方式中,具有优先级顺序的多个规则可以适用于单方药品和复方药品,且以复方药品为优先考虑对象,提高了编码对应的成分排查的可靠性以及全面性。
步骤302,检测至少一个编码中所有编码是否均检测完成;若是,执行步骤303;若至少一个编码没有检测完成,返回执行步骤301。
本实施例中,该编码是至少一个编码中按顺序排布的各个编码,也是当前编码。在步骤302中,若当前编码(该编码)满足多个规则中的一个即会被放入初筛候选编码中。若步骤302中,当前编码不满足多个规则中的任一个,则放弃该编码,重新返回步骤301中,在至少一个编码中当前编码之后 相邻的编码作为当前编码,并重新检测。
步骤303,确定包括该编码的初筛候选编码,然后执行步骤304。
本实施例中,初筛候选编码为初次得到的满足药品说明书文本要求的ATC编码,初筛候选编码中各个编码对应的成分满足具有优先级顺序的多个规则中的一个,初筛候选编码可以只有一个编码,也可以有多个编码。
本可选实现方式中,初筛候选编码包括上述至少一个编码中所有满足多个规则中的一个的编码,即至少一个该编码,而该编码对应的成分是否满足具有优先级顺序的多个规则中的一个。
步骤304,检测初筛候选编码是否只有一个编码;若检测结果为只有一个,执行步骤305。
本实施例中,针对检测结果为只有一个编码时,确定当前的初筛候选编码就是药品的ATC编码,无需再进行后续任何检测。
本可选实施例中,针对初筛候选编码是多个编码的情况,可选地,可以将初筛候选编码中的编码进行相似度匹配,将初筛候选编码中相似度最多的多个初筛候选编码中的一个初筛候选编码作为药品的ATC编码。
可选地,针对说明书文本为复方药品的说明书文本,可以将所有的初筛候选编码中初筛候选编码对应成分具有复方字样的编码做为药品的ATC编码。针对说明书文本为复方药品的说明书文本,可以将所有的初筛候选编码中初筛候选编码对应成分不具有复方字样的编码做为药品的ATC编码。
步骤305,确定初筛候选编码为药品的解剖学治疗学及化学分类系统编码。
本可选实现方式中,在药品关键信息包括药品成分时,基于药品成分对应确定的多个规则,确定药品的解剖学治疗学及化学分类系统编码,提高了ATC编码的确定的可靠性。
当药品关键信息包括药品成分和药品适应症时,在本实施例的一些可选实现方式中,如图4,示出了根据本公开的得到药品的解剖学治疗学及化学分类系统编码的方法的另一个实施例的流程400。该得到药品的解剖学治疗学及化学分类系统编码的方法包括以下步骤:
步骤401,针对至少一个编码中的每个编码,检测该编码对应的成分是否满足具有优先级顺序的多个规则中的一个。若该编码对应的成分满足具有 优先级顺序的多个规则中的一个,执行步骤402。
步骤402,检测至少一个编码中所有编码是否均检测完成;若是,执行步骤403;若至少一个编码没有检测完成,返回执行步骤401。
步骤403,确定包括该编码的初筛候选编码,然后执行步骤404。
步骤404,检测初筛候选编码是否只有一个编码。若检测结果为只有一个编码,执行步骤405。若检测结果为初筛候选编码为多个编码,执行步骤406。
步骤405,确定初筛候选编码为药品的解剖学治疗学及化学分类系统编码。
应当理解,上述步骤401-步骤405中的操作和特征,分别与步骤301-步骤305中的操作和特征相对应,因此,上述在步骤301-步骤305中对于操作和特征的描述,同样适用于步骤401-步骤405,在此不再赘述。
步骤406,基于药品适应症,确定药品对应的疾病类型,然后执行步骤407。
本实施例中,可选地,可以预先设置适应症与疾病类型对应关系表,在得到药品适应症之后,基于上述预先设置的适应症与疾病类型对应关系表,可以快速地得到与药品适应症对应的疾病类型。
在本实施例的一些可选实现方式中,基于药品适应症,确定药品对应的疾病类型包括:采用预先训练完成的分类模型对适应症进行疾病分类,得到分类模型输出的疾病类型。
在实际应用中,可以采用BERT模型构建分类模型,以使分类模型对药品的说明书文本中的适应症进行疾病分类,得到模型输出的不同疾病类型的概率值,比如,做14个疾病类型的分类。例如说明书中适应症包括“用于寻常痤疮,也可用于脂溢性皮炎及酒渣鼻、毛囊炎”,做14个疾病类型的分类,以确定药物属于治疗哪一类疾病的药物。这些分类是消化系统、代谢系统、血液及造血器官、心血管系统、皮肤病、泌尿生殖系统、性激素、抗感染、抗肿瘤及免疫用药、肌骨骼系统、神经系统、抗寄生虫、呼吸系统、感觉系统,共14个疾病类型。这14个分类也对应于ATC一级分类中的14个疾病类型。
针对上述说明书中的适应症“用于寻常痤疮,也可用于脂溢性皮炎及酒渣鼻、毛囊炎”,其中,分类模型针对上述14个疾病类型中输出各自的置信 度分数。例如,对上述适应症,分类模型输出分类分数分别为:消化系统(2%)、代谢系统(7%)、血液及造血器官(8%)、心血管系统(5%)、皮肤病(80%)、泌尿生殖系统(1%)、性激素(8%)、抗感染(10%)、抗肿瘤及免疫用药(2%)、肌骨骼系统(2%)、神经系统(2%)、抗寄生虫(2%)、呼吸系统(2%)、感觉系统(2%),则得到结果为上述适应症的疾病类型属于皮肤病。
通过训练一个疾病类型与药品适应症相对应的分类模型,即可通过药物说明书的适应症文本的描述得知该药是用来治疗哪方面疾病的,本实施例中,采用分类模型进行分类的分类准确率可达93%以上。
本可选实现方式中,将说明书文本中提取到的适应症输入到预先训练完成的分类模型,可以得到分类模型输出的疾病类型。进一步,将得到的疾病类型与初筛候选编码中各个编码对应的疾病类型进行对比,可以得到初筛候选编码中的药品对应的优选的ATC编码。通过分类模型可以提高疾病类型获取的准确度,保证了药品的ATC编码得到的可靠性。
步骤407,从初筛候选编码中筛选出与疾病类型对应的编码作为药品的解剖学治疗学及化学分类系统编码。
本实施例中,疾病类型可以是一个,也可以是多个;在疾病类型为一个时,与疾病类型对应的初筛候选编码即为药品的ATC编码。在疾病类型为多个时,可选地,可以将与该疾病类型中最多疾病类型对应的初筛候选编码做为药品的ATC编码。当然还可以选取得到的第一个疾病类型对应的编码最为药品的ATC编码。对此,本申请不做限定。
本可选实现方式中,在药品关键信息包括药品成分和药品适应症时,基于药品成分确定至少一个编码中的初筛候选编码,在初筛候选编码为多个编码时,基于药品适应症确定药品对应的疾病类型,从初筛候选编码中确定药品的ATC编码,解决了同一个药品具有多个ATC编码的问题,保证了ATC编码确定的准确性。
进一步参考图5,作为对上述各图所示方法的实现,本公开提供了确定药品编码的装置的一个实施例。该装置实施例与图2所示的方法实施例相对应,该装置具体可以应用于各种电子设备中。
如图5所示,本公开的实施例提供了一种确定药品编码的装置500,该装置500包括:获取单元501、提取单元502、得到单元503、筛选单元504。 其中,获取单元501,可以被配置成被配置成获取药品的说明书文本。提取单元502,可以被配置成提取说明书文本中的药品关键信息。得到单元503,可以被配置成基于预先创建的编码倒排索引,得到与药品关键信息相关的至少一个编码以及与各个编码对应的成分。筛选单元504,可以被配置成基于药品关键信息以及各个编码对应的成分,对至少一个编码进行筛选,得到药品的解剖学治疗学及化学分类系统编码。
在本实施例中,确定药品编码的装置500中,获取单元501、提取单元502、得到单元503、筛选单元504的具体处理及其所带来的技术效果可分别参考图2对应实施例中的步骤201、步骤202、步骤203、步骤204。
在一些实施例中,上述药品关键信息包括药品成分,上述筛选单元504包括:检测模块(图中未示出)、初筛模块(图中未示出)、确定模块(图中未示出)。其中,检测模块,可以被配置成针对至少一个编码中的每个编码,检测该编码对应的成分是否满足具有优先级顺序的多个规则中的一个,多个规则基于药品成分确定。初筛模块,可以被配置成响应于确定该编码对应的成分满足多个规则中的一个且所有编码均检测完成,确定包括该编码的初筛候选编码。确定模块,可以被配置成响应于检测到初筛候选编码只有一个编码,确定初筛候选编码为药品的解剖学治疗学及化学分类系统编码。
在一些实施例中,上述多个规则按从高至低优先级排序如下:1)药品成分具有两种或以上时,该编码对应的成分包括药品的所有药品成分;2)药品成分具有两种或以上时,该编码对应的成分包括药品的至少一个药品成分且含有复方字样;3)药品成分具有两种或以上时,该编码对应的成分包括药品的至少一个药品成分且不含有复方字样;4)药品成分具有一种时,该编码对应的成分包括药品成分。
在一些实施例中,上述药品关键信息包括药品成分,上述筛选单元504包括:匹配模块(图中未示出)、响应模块(图中未示出)、编码模块(图中未示出)。其中,匹配模块,可以被配置成针对至少一个编码中的每个编码,检测该编码对应的成分是否与药品成分相匹配。响应模块,可以被配置成响应于确定该编码对应的成分与药品成分相匹配且所有编码均检测完成,得到包括该编码的初筛候选编码。编码模块,可以被配置成响应于检测到初筛候选编码只有一个编码,确定初筛候选编码为药品的解剖学治疗学及化学分类系统编码。
在一些实施例中,上述药品关键信息还包括:药品适应症;上述筛选单元504包括:分类模块(图中未示出)、确认模块(图中未示出)。其中,分类模块,可以被配置成响应于检测到初筛候选编码为多个编码,基于药品适应症,确定药品对应的疾病类型。上述确认模块,可以被配置成从初筛候选编码中筛选出与疾病类型对应的编码作为药品的解剖学治疗学及化学分类系统编码。
在一些实施例中,上述分类模块进一步地被配置成采用预先训练完成的分类模型对适应症进行疾病分类,得到分类模型输出的疾病类型。
根据本公开的实施例提供的确定药品编码的方法:首先,获取单元501获取药品的说明书文本;其次,提取单元502提取说明书文本中的药品关键信息;然后,得到单元503基于预先创建的编码倒排索引,得到与药品关键信息相关的至少一个编码以及与各个编码对应的成分;最后,筛选单元504基于药品关键信息以及各个编码对应的成分,对至少一个编码进行筛选,得到药品的解剖学治疗学及化学分类系统编码。由此,能够依据药品的说明书文本,通过预先创建的编码倒排索引,自动化地对药物进行ATC编码,解决广大药师在工作中的难题,为医药的信息化系统提供了编码基础信息。
下面参考图6,其示出了适于用来实现本公开的实施例的电子设备600的结构示意图。
如图6所示,电子设备600可以包括处理装置(例如中央处理器、图形处理器等)601,其可以根据存储在只读存储器(ROM)602中的程序或者从存储装置608加载到随机访问存储器(RAM)603中的程序而执行各种适当的动作和处理。在RAM 603中,还存储有电子设备600操作所需的各种程序和数据。处理装置601、ROM 602以及RAM 603通过总线604彼此相连。输入/输出(I/O)接口605也连接至总线604。
通常,以下装置可以连接至I/O接口605:包括例如触摸屏、触摸板、键盘、鼠标等的输入装置606;包括例如液晶显示器(LCD,Liquid Crystal Display)、扬声器、振动器等的输出装置607;包括例如磁带、硬盘等的存储装置608;以及通信装置609。通信装置609可以允许电子设备600与其他设备进行无线或有线通信以交换数据。虽然图6示出了具有各种装置的电子设备600,但是应理解的是,并不要求实施或具备所有示出的装置。可以替 代地实施或具备更多或更少的装置。图6中示出的每个方框可以代表一个装置,也可以根据需要代表多个装置。
特别地,根据本公开的实施例,上文参考流程图描述的过程可以被实现为计算机软件程序。例如,本公开的实施例包括一种计算机程序产品,其包括承载在计算机可读介质上的计算机程序,该计算机程序包含用于执行流程图所示的方法的程序代码。在这样的实施例中,该计算机程序可以通过通信装置609从网络上被下载和安装,或者从存储装置608被安装,或者从ROM602被安装。在该计算机程序被处理装置601执行时,执行本公开的实施例的方法中限定的上述功能。
需要说明的是,本公开的实施例的计算机可读介质可以是计算机可读信号介质或者计算机可读存储介质或者是上述两者的任意组合。计算机可读存储介质例如可以是——但不限于——电、磁、光、电磁、红外线、或半导体的系统、装置或器件,或者任意以上的组合。计算机可读存储介质的更具体的例子可以包括但不限于:具有一个或多个导线的电连接、便携式计算机磁盘、硬盘、随机访问存储器(RAM)、只读存储器(ROM)、可擦式可编程只读存储器(EPROM或闪存)、光纤、便携式紧凑磁盘只读存储器(CD-ROM)、光存储器件、磁存储器件、或者上述的任意合适的组合。在本公开的实施例中,计算机可读存储介质可以是任何包含或存储程序的有形介质,该程序可以被指令执行系统、装置或者器件使用或者与其结合使用。而在本公开的实施例中,计算机可读信号介质可以包括在基带中或者作为载波一部分传播的数据信号,其中承载了计算机可读的程序代码。这种传播的数据信号可以采用多种形式,包括但不限于电磁信号、光信号或上述的任意合适的组合。计算机可读信号介质还可以是计算机可读存储介质以外的任何计算机可读介质,该计算机可读信号介质可以发送、传播或者传输用于由指令执行系统、装置或者器件使用或者与其结合使用的程序。计算机可读介质上包含的程序代码可以用任何适当的介质传输,包括但不限于:电线、光缆、RF(Radio Frequency,射频)等等,或者上述的任意合适的组合。
上述计算机可读介质可以是上述服务器中所包含的;也可以是单独存在,而未装配入该服务器中。上述计算机可读介质承载有一个或者多个程序,当上述一个或者多个程序被该服务器执行时,使得该服务器:获取药品的说明书文本;提取说明书文本中的药品关键信息;基于预先创建的编码倒排索引, 得到与药品关键信息相关的至少一个编码以及与各个编码对应的成分;基于药品关键信息以及各个编码对应的成分,对至少一个编码进行筛选,得到药品的解剖学治疗学及化学分类系统编码。
可以以一种或多种程序设计语言或其组合来编写用于执行本公开的实施例的操作的计算机程序代码,程序设计语言包括面向对象的程序设计语言—诸如Java、Smalltalk、C++,还包括常规的过程式程序设计语言—诸如“C”语言或类似的程序设计语言。程序代码可以完全地在用户计算机上执行、部分地在用户计算机上执行、作为一个独立的软件包执行、部分在用户计算机上部分在远程计算机上执行、或者完全在远程计算机或服务器上执行。在涉及远程计算机的情形中,远程计算机可以通过任意种类的网络——包括局域网(LAN)或广域网(WAN)—连接到用户计算机,或者,可以连接到外部计算机(例如利用因特网服务提供商来通过因特网连接)。
附图中的流程图和框图,图示了按照本公开的各种实施例的系统、方法和计算机程序产品的可能实现的体系架构、功能和操作。在这点上,流程图或框图中的每个方框可以代表一个模块、程序段、或代码的一部分,该模块、程序段、或代码的一部分包含一个或多个用于实现规定的逻辑功能的可执行指令。也应当注意,在有些作为替换的实现中,方框中所标注的功能也可以以不同于附图中所标注的顺序发生。例如,两个接连地表示的方框实际上可以基本并行地执行,它们有时也可以按相反的顺序执行,这依所涉及的功能而定。也要注意的是,框图和/或流程图中的每个方框、以及框图和/或流程图中的方框的组合,可以用执行规定的功能或操作的专用的基于硬件的系统来实现,或者可以用专用硬件与计算机指令的组合来实现。
描述于本公开的实施例中所涉及到的单元可以通过软件的方式实现,也可以通过硬件的方式来实现。所描述的单元也可以设置在处理器中,例如,可以描述为:一种处理器,包括获取单元、提取单元、得到单元以及筛选单元。其中,这些单元的名称在某种情况下并不构成对该单元本身的限定,例如,获取单元还可以被描述为“被配置成获取药品的说明书文本”的单元。
以上描述仅为本公开的较佳实施例以及对所运用技术原理的说明。本领域技术人员应当理解,本公开的实施例中所涉及的发明范围,并不限于上述技术特征的特定组合而成的技术方案,同时也应涵盖在不脱离上述发明构思的情况下,由上述技术特征或其等同特征进行任意组合而形成的其它技术方 案。例如上述特征与本公开的实施例中公开的(但不限于)具有类似功能的技术特征进行互相替换而形成的技术方案。

Claims (12)

  1. 一种确定药品编码的方法,所述方法包括:
    获取药品的说明书文本;
    提取所述说明书文本中的药品关键信息;
    基于预先创建的编码倒排索引,得到与所述药品关键信息相关的至少一个编码以及与各个编码对应的成分;
    基于所述药品关键信息以及各个编码对应的成分,对所述至少一个编码进行筛选,得到所述药品的解剖学治疗学及化学分类系统编码。
  2. 根据权利要求1所述的方法,其中,所述药品关键信息包括:药品成分,
    所述基于所述药品关键信息以及各个编码对应的成分,对所述至少一个编码进行筛选,得到所述药品的解剖学治疗学及化学分类系统编码,包括:
    针对所述至少一个编码中的每个编码,检测该编码对应的成分是否满足具有优先级顺序的多个规则中的一个,所述多个规则基于所述药品成分确定;
    响应于确定该编码对应的成分满足所述多个规则中的一个且所有编码均检测完成,确定包括该编码的初筛候选编码;
    响应于检测到所述初筛候选编码只有一个编码,确定所述初筛候选编码为所述药品的解剖学治疗学及化学分类系统编码。
  3. 根据权利要求2所述的方法,其中,所述多个规则按从高至低优先级排序如下:
    1)所述药品成分具有两种或以上时,该编码对应的成分包括所述药品的所有药品成分;
    2)所述药品成分具有两种或以上时,该编码对应的成分包括所述药品的至少一个药品成分且含有复方字样;
    3)所述药品成分具有两种或以上时,该编码对应的成分包括所述药品的至少一个药品成分且不含有复方字样;
    4)所述药品成分具有一种时,该编码对应的成分包括所述药品成分。
  4. 根据权利要求1所述的方法,其中,所述药品关键信息包括药品成 分,
    所述基于所述药品关键信息以及各个编码对应的成分,对所述至少一个编码进行筛选,得到所述药品的解剖学治疗学及化学分类系统编码,包括:
    针对所述至少一个编码中的每个编码,检测该编码对应的成分是否与所述药品成分相匹配;
    响应于确定该编码对应的成分与所述药品成分相匹配且所有编码均检测完成,得到包括该编码的初筛候选编码;
    响应于检测到所述初筛候选编码只有一个编码,确定所述初筛候选编码为所述药品的解剖学治疗学及化学分类系统编码。
  5. 根据权利要求2-4之一所述的方法,其中,所述药品关键信息还包括药品适应症,
    所述基于所述药品关键信息以及各个编码对应的成分,对所述至少一个编码进行筛选,得到所述药品的解剖学治疗学及化学分类系统编码,还包括:
    响应于检测到所述初筛候选编码为多个编码,基于所述药品适应症,确定所述药品对应的疾病类型;
    从所述初筛候选编码中筛选出与所述疾病类型对应的编码作为所述药品的解剖学治疗学及化学分类系统编码。
  6. 根据权利要求5所述的方法,其中,所述基于所述药品适应症,确定所述药品对应的疾病类型,包括:
    采用预先训练完成的分类模型对所述适应症进行疾病分类,得到所述分类模型输出的疾病类型。
  7. 一种确定药品编码的装置,所述装置包括:
    获取单元,被配置成获取药品的说明书文本;
    提取单元,被配置成提取所述说明书文本中的药品关键信息;
    得到单元,被配置成基于预先创建的编码倒排索引,得到与所述药品关键信息相关的至少一个编码以及与各个编码对应的成分;
    筛选单元,被配置成基于所述药品关键信息以及各个编码对应的成分,对所述至少一个编码进行筛选,得到所述药品的解剖学治疗学及化学分类系 统编码。
  8. 根据权利要求7所述的装置,其中,所述药品关键信息包括药品成分,所述筛选单元包括:
    检测模块,被配置成针对所述至少一个编码中的每个编码,检测该编码对应的成分是否满足具有优先级顺序的多个规则中的一个,所述多个规则基于所述药品成分确定;
    初筛模块,被配置成响应于确定该编码对应的成分满足所述多个规则中的一个且所有编码均检测完成,确定包括该编码的初筛候选编码;
    确定模块,被配置成响应于检测到所述初筛候选编码只有一个编码,确定所述初筛候选编码为所述药品的解剖学治疗学及化学分类系统编码。
  9. 根据权利要求8所述的装置,其中,所述药品关键信息还包括药品适应症,
    所述筛选单元还包括:
    分类模块,被配置成响应于检测到所述初筛候选编码为多个编码,基于所述药品适应症,确定所述药品对应的疾病类型;
    确认模块,被配置成从所述初筛候选编码中筛选出与所述疾病类型对应的编码作为所述药品的解剖学治疗学及化学分类系统编码。
  10. 一种电子设备,包括:
    一个或多个处理器;
    存储装置,其上存储有一个或多个程序;
    当所述一个或多个程序被所述一个或多个处理器执行时,使得所述一个或多个处理器实现如权利要求1-6中任一所述的方法。
  11. 一种计算机可读介质,其上存储有计算机程序,其中,该程序被处理器执行时实现如权利要求1-6中任一所述的方法。
  12. 一种计算机程序产品,包括计算机程序,所述计算机程序在被处理器执行时实现如权利要求1-6中任一项所述的方法。
PCT/CN2021/138298 2021-01-15 2021-12-15 确定药品编码的方法、装置、电子设备以及计算机介质 WO2022151896A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
JP2023553759A JP2023550212A (ja) 2021-01-15 2021-12-15 医薬品コードを決定するための方法、装置、電子機器及びコンピュータ媒体
US18/272,315 US20240071630A1 (en) 2021-01-15 2021-12-15 Method and apparatus for determining drug code, electronic device, and computer medium

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110054078.1 2021-01-15
CN202110054078.1A CN113821649B (zh) 2021-01-15 2021-01-15 确定药品编码的方法、装置、电子设备以及计算机介质

Publications (1)

Publication Number Publication Date
WO2022151896A1 true WO2022151896A1 (zh) 2022-07-21

Family

ID=78912354

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/138298 WO2022151896A1 (zh) 2021-01-15 2021-12-15 确定药品编码的方法、装置、电子设备以及计算机介质

Country Status (4)

Country Link
US (1) US20240071630A1 (zh)
JP (1) JP2023550212A (zh)
CN (1) CN113821649B (zh)
WO (1) WO2022151896A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117349452A (zh) * 2023-12-04 2024-01-05 长春中医药大学 一种用于中医药物检索的信息服务系统

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116955497B (zh) * 2023-04-07 2024-07-23 广州标点医药信息股份有限公司 一种中成药数据的分类方法

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160180028A1 (en) * 2013-09-02 2016-06-23 Fujitsu Limited Information retrieval processing device and method
CN107480425A (zh) * 2017-07-14 2017-12-15 广东医睦科技有限公司 一种基于药品编码的药品信息处理方法
CN107784611A (zh) * 2017-04-11 2018-03-09 平安医疗健康管理股份有限公司 药品编码方法及装置
CN109408631A (zh) * 2018-09-03 2019-03-01 平安医疗健康管理股份有限公司 药品数据处理方法、装置、计算机设备和存储介质
CN110827948A (zh) * 2019-10-31 2020-02-21 北京东软望海科技有限公司 用药数据处理方法、装置、电子设备及可读存储介质
US20200320139A1 (en) * 2019-04-04 2020-10-08 Iqvia Inc. Predictive system for generating clinical queries
CN111933244A (zh) * 2020-08-17 2020-11-13 医渡云(北京)技术有限公司 药品数据编码方法、装置、计算机可读介质及电子设备

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160180028A1 (en) * 2013-09-02 2016-06-23 Fujitsu Limited Information retrieval processing device and method
CN107784611A (zh) * 2017-04-11 2018-03-09 平安医疗健康管理股份有限公司 药品编码方法及装置
CN107480425A (zh) * 2017-07-14 2017-12-15 广东医睦科技有限公司 一种基于药品编码的药品信息处理方法
CN109408631A (zh) * 2018-09-03 2019-03-01 平安医疗健康管理股份有限公司 药品数据处理方法、装置、计算机设备和存储介质
US20200320139A1 (en) * 2019-04-04 2020-10-08 Iqvia Inc. Predictive system for generating clinical queries
CN110827948A (zh) * 2019-10-31 2020-02-21 北京东软望海科技有限公司 用药数据处理方法、装置、电子设备及可读存储介质
CN111933244A (zh) * 2020-08-17 2020-11-13 医渡云(北京)技术有限公司 药品数据编码方法、装置、计算机可读介质及电子设备

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117349452A (zh) * 2023-12-04 2024-01-05 长春中医药大学 一种用于中医药物检索的信息服务系统
CN117349452B (zh) * 2023-12-04 2024-02-09 长春中医药大学 一种用于中医药物检索的信息服务系统

Also Published As

Publication number Publication date
US20240071630A1 (en) 2024-02-29
CN113821649B (zh) 2022-11-08
JP2023550212A (ja) 2023-11-30
CN113821649A (zh) 2021-12-21

Similar Documents

Publication Publication Date Title
US10755804B2 (en) Health information system for searching, analyzing and annotating patient data
US9619583B2 (en) Predictive analysis by example
WO2022151896A1 (zh) 确定药品编码的方法、装置、电子设备以及计算机介质
JP2020170516A (ja) 臨床クエリを生成するための予測システム
US20230245005A1 (en) System and method for detecting drug adverse effects in social media and mobile applications data
Jiang et al. Extracting and standardizing medication information in clinical text–the MedEx-UIMA system
WO2018200274A1 (en) Systems and methods for extracting form information using enhanced natural language processing
CN114078597A (zh) 从文本获得支持的决策树用于医疗健康应用
CN111145847A (zh) 临床试验数据的录入方法及装置、介质和电子设备
JP2023514023A (ja) 質問の検索装置、質問の検索方法、デバイス、および記憶媒体
Basu et al. Call for data standardization: lessons learned and recommendations in an imaging study
Alfattni et al. Extracting drug names and associated attributes from discharge summaries: text mining study
CN116992839A (zh) 病案首页自动生成方法、装置及设备
Li et al. A patient-screening tool for clinical research based on electronic health records using OpenEHR: development study
Zhou et al. Complementary and Integrative Health Information in the literature: its lexicon and named entity recognition
CN113160914A (zh) 在线问诊方法、装置、电子设备及存储介质
TaftiAhmad Probing patient messages enhanced by natural language processing: A top-down message corpus analysis
Kocabiyikoglu et al. A spoken drug prescription dataset in french for spoken language understanding
Chen et al. Characterizing the use and contents of free-text family history comments in the Electronic Health Record
CN111523309A (zh) 药品信息归一化的方法、装置、存储介质及电子设备
CN116913548A (zh) 不良反应数据分析方法、装置、电子设备和存储介质
Aberdeen et al. An annotation and modeling schema for prescription regimens
Zeng et al. Adapting a natural language processing tool to facilitate clinical trial curation for personalized cancer therapy
WO2021159054A1 (en) Method and system for incorporating patient information
Lee et al. Establishing the Automatic Identification of Clinical Trial Cohorts from Electronic Health Records by Matching Normalized Eligibility Criteria and Patient Clinical Characteristics

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21919085

Country of ref document: EP

Kind code of ref document: A1

WWE Wipo information: entry into national phase

Ref document number: 2023553759

Country of ref document: JP

WWE Wipo information: entry into national phase

Ref document number: 18272315

Country of ref document: US

NENP Non-entry into the national phase

Ref country code: DE

WWE Wipo information: entry into national phase

Ref document number: 11202305186V

Country of ref document: SG

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 24.10.2023)

122 Ep: pct application non-entry in european phase

Ref document number: 21919085

Country of ref document: EP

Kind code of ref document: A1