WO2020220635A1 - Procédé et appareil de classification de médicaments, dispositif informatique et support de stockage - Google Patents

Procédé et appareil de classification de médicaments, dispositif informatique et support de stockage Download PDF

Info

Publication number
WO2020220635A1
WO2020220635A1 PCT/CN2019/117240 CN2019117240W WO2020220635A1 WO 2020220635 A1 WO2020220635 A1 WO 2020220635A1 CN 2019117240 W CN2019117240 W CN 2019117240W WO 2020220635 A1 WO2020220635 A1 WO 2020220635A1
Authority
WO
WIPO (PCT)
Prior art keywords
vector
word vector
target feature
word
euclidean distance
Prior art date
Application number
PCT/CN2019/117240
Other languages
English (en)
Chinese (zh)
Inventor
陈娴娴
阮晓雯
徐亮
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Priority to SG11202008417RA priority Critical patent/SG11202008417RA/en
Publication of WO2020220635A1 publication Critical patent/WO2020220635A1/fr

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • G06F16/355Class or cluster creation or modification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H10/00ICT specially adapted for the handling or processing of patient-related medical or healthcare data
    • G16H10/60ICT specially adapted for the handling or processing of patient-related medical or healthcare data for patient-specific data, e.g. for electronic patient records
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02ATECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
    • Y02A90/00Technologies having an indirect contribution to adaptation to climate change
    • Y02A90/10Information and communication technologies [ICT] supporting adaptation to climate change, e.g. for weather forecasting or climate simulation

Definitions

  • the embodiments of the application relate to the field of drug classification, in particular to a method, device, computer equipment and storage medium for drug classification.
  • Drug classification management is an internationally accepted management method. It divides drugs into prescription drugs and non-prescription drugs and makes corresponding management regulations based on the safety and effectiveness principles of drugs, according to their varieties, specifications, indications, dosages and routes of administration. Its significance is to ensure the safety of people's medication.
  • the drug classification model mainly starts with a supervised model, which requires a large amount of labor costs to label samples in the previous period.
  • the inventor realizes that manual labeling often has inaccurate labeling and imperfect classification. For this reason, a lot of manpower is required to perform maintenance operations such as adding and modifying categories. As a result, the classification of drugs is time-consuming and labor-intensive, and the accuracy of classification is also low.
  • the embodiments of the present application provide a drug classification method, device, computer equipment, and storage medium that can complete drug classification without marking.
  • a technical solution adopted in the embodiments created by this application is to provide a method for classifying drugs, including: obtaining, according to the user’s case information, a target feature word vector that characterizes the user’s condition and the use of drugs, wherein the The case information is text information, the target feature word vector includes a first word vector and a second word vector, the first word vector is obtained by extracting the text information through a neural network model, and the second word vector is The text information is statistically obtained after stop words are filtered; the target feature word vector is input into a preset drug classification model, where the drug classification model is clustered by calculating the distance between different feature word vectors Class unsupervised training model; classify and label the used drugs according to the cluster set of the used drugs output by the drug classification model, wherein the classification and label content is the cluster set of the used drugs At least one high-frequency word.
  • an embodiment of the present application also provides a medicine classification device, including: an acquisition module for acquiring a target feature word vector that characterizes the user's condition and the use of medicines according to the user's case information, wherein the case information is Text information, the target feature word vector includes a first word vector and a second word vector, the first word vector is obtained by extracting the text information through a neural network model, and the second word vector is obtained by comparing the text
  • the information is filtered through stop words and then statistics are obtained;
  • the processing module is used to input the target feature word vector into a preset drug classification model, where the drug classification model is calculated by calculating the distance between different feature word vectors
  • An unsupervised training model for clustering an execution module for classifying and labeling the used drugs according to the cluster set of the used drugs output by the drug classification model, wherein the classification is labeled as the use At least one high-frequency word is concentrated in the cluster of drugs.
  • an embodiment of the present application further provides a computer device including a memory and a processor.
  • the memory stores computer-readable instructions.
  • the processor executes the steps of a method for classifying medicines.
  • the method for classifying medicines includes the following steps: obtaining, according to the user’s case information, a target feature word vector that characterizes the user’s condition and the use of drugs, wherein the case information is text information ,
  • the target feature word vector includes a first word vector and a second word vector, the first word vector is obtained by extracting the text information through a neural network model, and the second word vector is obtained by extracting the text information
  • the stop words are filtered and obtained by statistics;
  • the target feature word vector is input into a preset drug classification model, where the drug classification model is an unsupervised training of clustering by calculating the distance between different feature word vectors Model; classify and label the used drugs according to the cluster set of the used drugs output by the drug classification model, wherein the classification and label content is at least one high-frequency word in the cluster set of the used drugs .
  • embodiments of the present application also provide a storage medium storing computer-readable instructions.
  • the method for classifying medicines includes the following steps: obtaining a target feature word vector that characterizes the user’s condition and the use of drugs according to the user’s case information, wherein the case information is text information, and the target feature word
  • the vector includes a first word vector and a second word vector, the first word vector is obtained by extracting the text information through a neural network model, and the second word vector is calculated by filtering the text information by stop words Obtain; input the target feature word vector into a preset drug classification model, where the drug classification model is an unsupervised training model that clusters by calculating the distance between different feature word vectors; according to the drug
  • the cluster set of used medicines output by the classification model classifies and annotates the used medicines, wherein the content of the classification and annotation is at least one high-frequency word in
  • the embodiments of the application can improve the efficiency of drug classification, and the use of case information can further strengthen the correspondence between drugs and disease conditions, and improve the accuracy of the classification results.
  • Fig. 1 is a schematic diagram of the basic flow of the method for classifying drugs according to an embodiment of the application
  • FIG. 2 is a schematic diagram of a process of collecting a first word vector through a neural network model according to an embodiment of the application
  • FIG. 3 is a schematic diagram of the process of extracting word vectors through keyword sets according to an embodiment of the application
  • FIG. 4 is a schematic diagram of a process of generating a first-level cluster set according to an embodiment of the application
  • FIG. 5 is a schematic diagram of a process of generating a secondary cluster set according to an embodiment of the application
  • FIG. 6 is a schematic diagram of a process of generating a three-level cluster set according to an embodiment of the application
  • FIG. 7 is a schematic diagram of three-level classification according to an embodiment of this application.
  • FIG. 8 is a schematic diagram of the basic structure of a drug classification device according to an embodiment of the application.
  • Fig. 9 is a block diagram of the basic structure of a computer device according to an embodiment of the application.
  • Fig. 1 is a schematic diagram of the basic flow of the drug classification method in this embodiment.
  • a drug classification method includes:
  • the target feature word vector that characterizes the user's condition and medication use according to the user's case information
  • the case information is text information
  • the target feature word vector includes a first word vector and a second word vector
  • the first A word vector is obtained by extracting the text information through a neural network model
  • the second word vector is obtained by performing statistics after filtering the text information by stop words
  • the user’s behavior information is recorded throughout the entire process.
  • the recorded behavior information includes: the user’s medical condition, the use of drugs, and the user’s laboratory results.
  • the above-mentioned behavior information is defined as the user’s medical condition. .
  • the foregoing medical condition information is all text information, but is not limited to this. According to different specific application scenarios, in some embodiments, the medical condition information further includes: picture information and sound information.
  • the target feature word vector in the case information is vector information that characterizes the user's condition and the use of drugs.
  • the method of extracting the target feature word vector can be used to extract the feature vector through a neural network model that has been trained to a convergent state.
  • the target feature word vector can be extracted by calculating the word frequency of the keywords in the case information.
  • the target feature word vector is first extracted through a neural network model, and then the word frequency statistics method is used for calculation, and finally the results obtained by the two calculation methods are combined to obtain the target feature word vector.
  • S1200 Input the target feature word vector into a preset drug classification model, where the drug classification model is an unsupervised training model that performs clustering by calculating the distance between different feature word vectors;
  • the target feature word vector is input into a preset drug classification model, where the drug classification model is an unsupervised training model for clustering by calculating the distance between different feature word vectors.
  • the drug classification model adopts an unsupervised model, and an unsupervised training model is used to cluster feature word vectors.
  • the unsupervised training model mainly calculates the inter-class distance between different feature word vectors and sets a distance with a measurement property. Threshold, cluster the feature word vectors whose distance between classes is less than the distance threshold to generate a cluster set.
  • each different cluster set is a classification category of the medicine.
  • the calculation of the distance between classes is actually calculating the similarity of the condition information of different drugs.
  • the smaller the distance between classes the closer the efficacy of different drugs.
  • the greater the distance between classes the greater the difference in efficacy of different drugs. Therefore, different classification categories can achieve different cures or curative effects.
  • the classification categories are divided into different levels, and after the first level division is completed, further classification is performed in different clusters.
  • the method adopted is to reduce the value of the distance threshold, so that the feature word vectors in the cluster set are further distinguished.
  • reducing the parameter value of the effective point spacing in different feature word vectors can make the intra-class distance of different feature word vectors more converge, and the convergence of the intra-class distance will further increase the inter-class distance between feature word vectors. Therefore, the differentiation between different feature word vectors in the cluster can be further increased, which provides a good condition for further subdividing the categories in the cluster.
  • the cluster set is divided into 3 levels, but not limited to, according to different specific application scenarios, the cluster set can be divided into: level 1, level 2, level 4, level 5 or more. .
  • S1300 Classify and label the used drugs according to the cluster set of the used drugs output by the drug classification model, wherein the classification label is at least one high-frequency word in the cluster set of the used drugs.
  • each cluster set and the drugs in the last set are labeled.
  • the labeling method of the cluster set is: extract the word with the highest frequency in the case information of each drug in the cluster set as the label name of the cluster set. In some embodiments, when there are multiple If the label name is yes, it is selected in turn according to the sorting result of frequency of occurrence. For the labeling of the drug name, the drug name is directly extracted from the case information for labeling.
  • the name of the drug and the medical condition information corresponding to the drug can be obtained by collecting the user's case information, the medical condition information corresponding to the drug name is converted into the target feature word vector, and the target The feature word vector is input as input data to the unsupervised drug classification model.
  • the drug classification model clusters drugs that can cure the same or similar conditions together to form a cluster category, and the cluster category can become a drug A category of classification.
  • the drug classification is completed by labeling the names of the drugs in the classification category.
  • This classification method can improve the efficiency of drug classification, and the use of case information can further strengthen the correspondence between drugs and disease conditions, and improve the accuracy of the classification results.
  • FIG. 2 is a schematic diagram of the process of collecting the first word vector through the neural network model in this embodiment.
  • S1100 includes:
  • the case information is transformed into a vector set that can be recognized or processed by the neural network model.
  • the method used is to convert the case information into a vector set through the word2vec model.
  • the case information can also be vectorized through TF-IDF (term frequency—inverse document frequency) technology.
  • S1112 input the behavior vector set into a preset feature extraction model, where the feature extraction model is a neural network model that is pre-trained to a convergent state and is used to extract behavior vectors to represent user behavior vectors;
  • the feature extraction model is a neural network model that is pre-trained to a convergent state and is used to extract behavior vectors to represent user behavior vectors;
  • the converted vector set is input into a preset feature extraction model, where the feature extraction model is pre-trained to a convergent state, and is used to extract a neural network model that represents user behavior vectors in a set of behavior vectors.
  • the feature extraction model is used to extract word vectors associated with user medical information and drug information in the vector set.
  • the training method is: collect training sample sets, which are composed of vector sets after conversion of several case information, manually calibrate the word vectors in each vector set, and then input the labeled vector sets into the neural network model in turn. After the neural network model extracts the excitation word vector, it calculates the distance between the excitation word vector and the label word vector. If the distance is greater than the set distance threshold, the weight of the neural network model is calibrated through the back propagation algorithm.
  • the vector set training is passed, and the vector set in the training sample set is trained by the above method until the
  • a set value for example, 98%
  • the feature extraction model trained to the convergent state can accurately extract the word vector associated with the user's condition information and drug information in the vector set, and the word vector is the user behavior vector.
  • the extracted user behavior vector can be used as the input data of the drug classification model.
  • the user behavior vector is defined as the first word vector.
  • the neural network model trained to convergence can quickly extract word vectors that record key information, simplifying the data processing procedures of the drug classification model, and improving the processing efficiency of the drug classification model.
  • FIG. 3 is a schematic diagram of the process of extracting word vectors through keyword sets in this embodiment.
  • stop words are words that are filtered out.
  • stop word list and record stop words obtained through statistics. For example, if words of verbs, adverbs, and adjectives are set as stop words, after filtering through the stop word list, the case information is removed. For the stop words with the above-mentioned part of speech, the case information after the stop words are removed generates a keyword set, which records the user's condition information and drug information in the keyword set.
  • the word frequency of each keyword in the keyword set is calculated.
  • the calculation method of the word frequency is:
  • the inverse document frequency is used to determine the importance of each keyword.
  • the size of the inverse document frequency is inversely proportional to the commonness of a word, and the inverse document frequency is The calculation method is:
  • the priority of each keyword is sorted by descending power, and according to the actual It is necessary to select the top keywords as the keywords to be converted. For example, extract the top 20 keywords as the keywords to be converted.
  • the determination of the number of keywords to be converted is not limited to this, according to specific application scenarios In some embodiments, the number of keywords to be converted can be any value.
  • S1124 Generate the second word vector according to the priority value of each keyword.
  • the keywords to be converted that have been filtered by the priority value are converted into the second word vector.
  • the first word vector is extracted by the neural network model. Because the relationship between the word vector extracted by the neural network model and the text information, it essentially carries people’s subjective will, and through repeated orientation Training and learning are obtained, but the neural network model has the defect that it is difficult to converge during the cross-training of multiple association relationships. Therefore, the extracted first word vector will have the problem of insufficient comprehensiveness of the extracted word vector or omission of the keyword vector.
  • the second word vector is calculated based on the filtering of stop words, without any personal will during the statistics, and can most directly reflect the distribution of each keyword, and extract the word vector more comprehensively but without emphasis.
  • the target feature word vector generated after the merging has more comprehensive data, which can not only highlight the feature word vector that people pay attention to, but also fully integrate the feature word vector existing in the customer view, so that the extracted
  • the data is comprehensive and focused, and comprehensive and focused data is conducive to improving the accuracy of the drug classification model. See step S1131.
  • the method of merging the first word vector and the second word vector is: add the word vector matrix composed of the first word vector and the word vector matrix composed of the second word vector, and the result of the operation is the target feature word vector
  • the vector matrix is the input data of the drug classification model.
  • the drug classification model generates a first-level cluster set, and the cluster set of the target feature word vector needs to be judged by calculating the Euclidean distance between the target feature word vector and different feature word vectors.
  • FIG. 4 is a schematic diagram of the process of generating a first-level cluster set in this embodiment.
  • S1200 includes:
  • S1211 calculate the first Euclidean distance between the target feature word vector and different feature word vectors
  • the distance between the target feature word vector and other feature word vectors needs to be calculated. Specifically, the Euclidean distance between the target feature word vector and different feature word vectors is calculated. Euclidean distance is collectively referred to as the first Euclidean distance. However, it is not limited to this. In some embodiments, the calculation method is to calculate the Mahalanobis distance or the cosine distance between the target feature word vector and different feature word vectors.
  • the first Euclidean distance between the target feature word vector and the different feature word vectors is compared with the set first distance threshold.
  • the first distance threshold is a threshold for measuring whether the feature word vectors meet the first screening condition, for example, the value of the first distance threshold is 0.5.
  • the target feature word vector should be clustered into the cluster set where the feature word vector is located. After clustering all the target feature word vectors of the case information, a first-level cluster set is generated, and the first-level cluster set is composed of at least one cluster set.
  • the drug classification model generates a secondary cluster set, and the secondary cluster set needs to be further refined clustering on the basis of the primary cluster set.
  • FIG. 5 is a schematic diagram of the process of generating a secondary cluster set in this embodiment.
  • the effective point spacing refers to the distance between classes that are not ignored in each feature word vector. Due to the efficiency of data calculation, Before calculating the inter-class distance, you need to filter the intra-class distance.
  • the filtering method is to set the parameter value of the effective point spacing.
  • the inter-class distance that is less than the effective point spacing in the inter-class distance will be judged as invalid. Therefore, , Decreasing the value of the parameter value of the effective point spacing will increase the diversity of the distance within the class, reveal more detailed parts of each feature word vector, and increase the difference between different feature word vectors in the same cluster. Conducive to two-level clustering.
  • the parameter value of the effective point spacing after correction is the first parameter value.
  • the value of the first parameter value is smaller than the parameter value of the effective point spacing set by the drug classification model before a cluster set.
  • the drug classification model After setting the first parameter value, the drug classification model performs a secondary clustering in each cluster in the primary clustering set.
  • the second-level clustering method is: in the cluster set where the target feature word vector is located, the second Euclidean distance between the target feature word vector and other feature word vectors is calculated.
  • the calculation of the second Euclidean distance can be modified to calculate the Mahalanobis distance or the cosine distance between the target feature word vector and different feature word vectors.
  • the second Euclidean distance between the target feature word vector and different feature word vectors is compared with the set second distance threshold.
  • the second distance threshold is a threshold for measuring whether the feature word vectors meet the second screening condition, for example, the value of the second distance threshold is 0.1.
  • the target feature word vector should be clustered with which feature word vector or type of feature word vector in the cluster set where the target feature word vector is located.
  • the target feature word vector should be clustered into the cluster set where the feature word vector is located.
  • a secondary cluster set is generated, and the secondary cluster set is composed of at least one cluster set.
  • the drug classification model generates a three-level cluster set, and the three-level cluster set needs to be further refined clustering on the basis of the two-level cluster set.
  • FIG. 6, is a schematic diagram of the process of generating a three-level cluster set in this embodiment.
  • the filtering method is to set the parameter value of the effective point spacing.
  • the inter-class distance that is less than the effective point spacing in the inter-class distance will be judged as invalid. Therefore, , Decreasing the value of the parameter value of the effective point spacing will increase the diversity of the distance within the class, reveal more detailed parts of each feature word vector, and increase the difference between different feature word vectors in the same cluster. Conducive to three-level clustering.
  • the parameter value of the effective point spacing after correction is the second parameter value. The value of the second parameter value is smaller than the value of the first parameter.
  • the drug classification model After setting the second parameter value, the drug classification model performs three-level clustering in each cluster in the second-level cluster set.
  • the three-level clustering method is: in the cluster set where the target feature word vector is located, the third Euclidean distance between the target feature word vector and other feature word vectors is calculated. But it is not limited to this. In some embodiments, the calculation of the third Euclidean distance can be modified to calculate the Mahalanobis distance or the cosine distance between the target feature word vector and different feature word vectors.
  • the third Euclidean distance between the target feature word vector and different feature word vectors is compared with the set third distance threshold.
  • the third distance threshold is a threshold for measuring whether the feature word vectors meet the third filtering condition, for example, the value of the third distance threshold is 0.05.
  • Comparing the third Euclidean distance with the preset third distance threshold can determine the clustering set where the target feature word vector is located, and which feature word vector or type of feature word vector should be clustered with the target feature word vector.
  • the target feature word vector should be clustered into the cluster set where the feature word vector is located.
  • a three-level cluster set is generated, and the three-level cluster set is composed of at least one cluster set. So far, the three-level classification of drugs is completed, but the setting of the classification level is not limited to this.
  • the parameter value of the effective effective point spacing and the distance threshold can be further corrected to further refine the classification.
  • FIG. 7 is a schematic diagram of the three-level classification in this embodiment.
  • the classification of drugs is divided into three levels, namely: a first-level cluster set 11, a second-level cluster set 12, and a third-level cluster set 13.
  • the cluster sets of three different levels are arranged in a dendrogram.
  • the embodiment of the present application also provides a medicine classification device.
  • FIG. 8 is a schematic diagram of the basic structure of the medicine classification device of this embodiment.
  • a medicine classification device includes: an acquisition module 2100, a processing module 2200, and an execution module 2300.
  • the acquisition module 2100 is configured to acquire the target feature word vector that characterizes the user's condition and the use of drugs according to the user's case information, where the case information is text information, and the target feature word vector includes a first word vector and a second word vector, The first word vector is obtained by extracting the text information through a neural network model, and the second word vector is obtained by performing stop word filtering on the text information and then performing statistics;
  • the processing module 2200 is used to obtain the target feature word vector Input to the preset drug classification model, where the drug classification model is an unsupervised training model that clusters by calculating the distance between different feature word vectors;
  • the execution module 2300 is used to cluster the drugs used according to the output of the drug classification model.
  • the cluster is used to label the classification information of the used drugs, wherein the content of the classification and annotation is at least one high-frequency word in the cluster set of the used drugs.
  • the drug classification device When the drug classification device classifies drugs, it can obtain the name of the drug and the disease information corresponding to the drug by collecting the user's case information, convert the disease information corresponding to the drug name into a target feature word vector, and convert the target feature word
  • the vector is input as input data into the unsupervised drug classification model.
  • the drug classification model clusters drugs that can cure the same or similar conditions together to form a cluster category. This cluster category can become a drug classification model. A category.
  • the drug classification is completed by labeling the names of the drugs in the classification category.
  • This classification method can improve the efficiency of drug classification, and the use of case information can further strengthen the correspondence between drugs and disease conditions, and improve the accuracy of the classification results.
  • the target feature word vector includes: a first word vector
  • the medicine classification device includes: a first conversion submodule, a first processing submodule, and a first execution submodule.
  • the first conversion sub-module is used to convert the case information into a behavior vector set
  • the first processing sub-module is used to input the behavior vector set into a preset feature extraction model, where the feature extraction model is pre-trained to a convergent state , Is used to extract the neural network model of the behavior vector centrally representing the user behavior vector
  • the first execution sub-module is used to read the user behavior vector output by the feature extraction model, and define the user behavior vector as the first word vector.
  • the target feature word vector includes: a second word vector
  • the drug classification device includes: a first filtering submodule, a second processing submodule, a first calculation submodule, and a second execution submodule.
  • the first filtering submodule is used to filter case information through a preset stop word list to generate a keyword set
  • the second processing submodule is used to count the word frequency of each keyword in the keyword set and the inverse document of each keyword Frequency
  • the first calculation sub-module is used to calculate the priority value of each keyword by word frequency and inverse document frequency
  • the second execution sub-module is used to generate the second word vector according to the priority value of each keyword.
  • the drug classification device includes: a first merging sub-module for merging the first word vector and the second word vector to generate the target feature word vector.
  • the drug classification device includes: a first calculation submodule, a first comparison submodule, and a third execution submodule.
  • the first calculation sub-module is used to calculate the first Euclidean distance between the target feature word vector and different feature word vectors
  • the first comparison sub-module is used to compare the first Euclidean distance with a preset first distance threshold Perform comparison
  • the third execution sub-module is used to cluster the target feature vector to the cluster set represented by the first Euclidean distance to generate a first-level cluster set when the first Euclidean distance is less than the first distance threshold.
  • the drug classification device includes: a second calculation submodule, a second comparison submodule, and a fourth execution submodule.
  • the second calculation sub-module is used to correct the parameter value of the effective point spacing in the drug classification model to generate the first parameter value, and calculate the second parameter between the target feature word vector and different feature word vectors in the first-level clustering set.
  • the second comparison submodule is used to compare the second Euclidean distance with a preset second distance threshold, where the second distance threshold is less than the first distance threshold; the fourth execution submodule is used for When the second Euclidean distance is less than the second distance threshold, cluster the target feature vector to the cluster set represented by the second Euclidean distance to generate a secondary cluster set.
  • the drug classification device includes: a third calculation submodule, a third comparison submodule, and a fifth execution submodule.
  • the third calculation sub-module is used to correct the parameter value of the effective point spacing in the drug classification model to generate the second parameter value, and calculate the third parameter value between the target feature word vector and different feature word vectors in the secondary clustering set.
  • the third comparison sub-module is used to compare the third Euclidean distance with a preset third distance threshold, where the third distance threshold is less than the second Distance threshold; the fifth execution submodule is used to cluster the target feature vector into the cluster set represented by the third Euclidean distance to generate a three-level cluster set when the third Euclidean distance is less than the third distance threshold.
  • FIG. 9 is a block diagram of the basic structure of the computer device in this embodiment.
  • the computer equipment includes a processor, a storage medium, a memory, and a network interface connected through a system bus.
  • the storage medium may be volatile or non-volatile.
  • the storage medium of the computer device stores an operating system, a database, and computer-readable instructions.
  • the database may store control information sequences, which are readable by the computer.
  • the processor can realize a medicine classification method.
  • the processor of the computer equipment is used to provide calculation and control capabilities, and supports the operation of the entire computer equipment.
  • Computer readable instructions may be stored in the memory of the computer device, and when the computer readable instructions are executed by the processor, the processor can make the processor execute a medicine classification method.
  • the network interface of the computer device is used to connect and communicate with the terminal.
  • FIG. 9 is only a block diagram of part of the structure related to the solution of the present application, and does not constitute a limitation on the computer equipment to which the solution of the present application is applied.
  • the specific computer equipment may Including more or fewer parts than shown in the figure, or combining some parts, or having a different arrangement of parts.
  • the processor is used to execute the specific functions of the acquisition module 2100, the processing module 2200, and the execution module 2300 in FIG. 8, and the memory stores the program codes and various data required to execute the above modules.
  • the network interface is used for data transmission between user terminals or servers.
  • the memory in this embodiment stores the program codes and data required to execute all the sub-modules in the medicine classification device, and the server can call the program codes and data of the server to execute the functions of all the sub-modules.
  • the computer equipment When the computer equipment classifies drugs, it can obtain the name of the drug and the medical condition information corresponding to the drug by collecting the user's case information, convert the medical condition information corresponding to the drug name into the target feature word vector, and convert the target feature word vector As input data, it is input into an unsupervised drug classification model.
  • the drug classification model clusters drugs that can cure the same or similar conditions together to form a cluster category. This cluster category can become a drug classification category. Finally, the drug classification is completed by labeling the names of the drugs in the classification category.
  • This classification method can improve the efficiency of drug classification, and the use of case information can further strengthen the correspondence between drugs and disease conditions, and improve the accuracy of the classification results.
  • the present application also provides a storage medium storing computer-readable instructions.
  • the computer-readable instructions are executed by one or more processors, the one or more processors execute the steps of the drug classification method in any of the foregoing embodiments.
  • the computer program can be stored in a computer readable storage medium. When executed, it may include the processes of the above-mentioned method embodiments.
  • the aforementioned storage medium may be a storage medium such as a magnetic disk, an optical disc, a read-only memory (Read-Only Memory, ROM), or a random access memory (Random Access Memory, RAM), etc.

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Databases & Information Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Epidemiology (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Primary Health Care (AREA)
  • Public Health (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

L'invention concerne un procédé et un appareil de classification de médicaments, un dispositif informatique et un support de stockage, ledit procédé comprenant les étapes consistant à : sur la base d'informations de cas d'un patient, obtenir des vecteurs de mots caractéristiques cibles représentant l'état de maladie et les médicaments utilisés par ledit patient ; entrer les vecteurs de mots caractéristiques cibles dans un modèle de classification de médicaments prédéfini, ledit modèle de classification étant un modèle d'apprentissage non supervisé qui met en œuvre un regroupement par le calcul de distances entre différents vecteurs de mots caractéristiques ; sur la base d'un ensemble de groupes des médicaments utilisés, ledit ensemble de groupes étant délivré par ledit modèle de classification, réaliser une annotation de classification sur les médicaments utilisés, l'annotation de classification étant au moins un mot de haute fréquence dans ledit ensemble de groupes. Le présent procédé de classification améliore l'efficacité de classification, et l'utilisation d'informations de cas dans cette dernière améliore en outre la correspondance entre le médicament et l'état de maladie, augmentant ainsi la précision des résultats de classification.
PCT/CN2019/117240 2019-09-18 2019-11-11 Procédé et appareil de classification de médicaments, dispositif informatique et support de stockage WO2020220635A1 (fr)

Priority Applications (1)

Application Number Priority Date Filing Date Title
SG11202008417RA SG11202008417RA (en) 2019-09-18 2019-11-11 Drug classificatiion method, device, computer, and storage medium

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910881521.5A CN110781298B (zh) 2019-09-18 2019-09-18 药品分类方法、装置、计算机设备及存储介质
CN201910881521.5 2019-09-18

Publications (1)

Publication Number Publication Date
WO2020220635A1 true WO2020220635A1 (fr) 2020-11-05

Family

ID=69383808

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/117240 WO2020220635A1 (fr) 2019-09-18 2019-11-11 Procédé et appareil de classification de médicaments, dispositif informatique et support de stockage

Country Status (3)

Country Link
CN (1) CN110781298B (fr)
SG (1) SG11202008417RA (fr)
WO (1) WO2020220635A1 (fr)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112906395A (zh) * 2021-03-26 2021-06-04 平安科技(深圳)有限公司 药物关系提取方法、装置、设备及存储介质
CN117316373A (zh) * 2023-10-08 2023-12-29 医顺通信息科技(常州)有限公司 基于his的药品全流程监管系统及其方法

Families Citing this family (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111627566A (zh) * 2020-05-22 2020-09-04 泰康保险集团股份有限公司 适应症信息处理方法与装置、存储介质、电子设备
CN111738014B (zh) * 2020-06-16 2023-09-08 北京百度网讯科技有限公司 一种药物分类方法、装置、设备及存储介质
CN111832661B (zh) * 2020-07-28 2024-04-02 平安国际融资租赁有限公司 分类模型构建方法、装置、计算机设备及可读存储介质
CN112035664A (zh) * 2020-08-28 2020-12-04 平安医疗健康管理股份有限公司 药品的归类方法、装置以及计算机设备
CN112466476A (zh) * 2020-12-17 2021-03-09 贝医信息科技(上海)有限公司 基于药品流向数据的流行病学趋势分析方法和装置
CN113488194B (zh) * 2021-05-25 2023-04-07 四川大学华西医院 一种基于分布式系统的药品识别方法与装置
CN113569994B (zh) * 2021-08-30 2024-05-21 平安医疗健康管理股份有限公司 雷同病历识别方法、装置、设备及存储介质
CN113470779B (zh) * 2021-09-03 2021-11-26 壹药网科技(上海)股份有限公司 药品类目识别方法及其系统
TWI781856B (zh) * 2021-12-16 2022-10-21 新加坡商鴻運科股份有限公司 藥物影像辨識方法、電腦設備及儲存介質

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109408631A (zh) * 2018-09-03 2019-03-01 平安医疗健康管理股份有限公司 药品数据处理方法、装置、计算机设备和存储介质
US20190131007A1 (en) * 2017-11-02 2019-05-02 Ir2Dx, Inc. Systems and Methods for Providing Professional Treatment Guidance for Diabetes Patients
CN109830302A (zh) * 2019-01-28 2019-05-31 北京交通大学 用药模式挖掘方法、装置和电子设备
CN110223751A (zh) * 2019-05-16 2019-09-10 平安科技(深圳)有限公司 基于医疗知识图谱的处方评价方法、系统及计算机设备
CN110245217A (zh) * 2019-06-17 2019-09-17 京东方科技集团股份有限公司 一种药品推荐方法、装置及电子设备

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP5317783B2 (ja) * 2009-03-25 2013-10-16 株式会社東芝 薬剤情報管理装置、薬剤情報管理方法、及び薬剤情報管理システム
CN108831559B (zh) * 2018-06-20 2021-01-15 清华大学 一种中文电子病历文本分析方法与系统
CN108875845B (zh) * 2018-07-26 2024-02-20 广东数相智能科技有限公司 一种药品分类装置
CN110021439B (zh) * 2019-03-07 2023-01-24 平安科技(深圳)有限公司 基于机器学习的医疗数据分类方法、装置和计算机设备

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190131007A1 (en) * 2017-11-02 2019-05-02 Ir2Dx, Inc. Systems and Methods for Providing Professional Treatment Guidance for Diabetes Patients
CN109408631A (zh) * 2018-09-03 2019-03-01 平安医疗健康管理股份有限公司 药品数据处理方法、装置、计算机设备和存储介质
CN109830302A (zh) * 2019-01-28 2019-05-31 北京交通大学 用药模式挖掘方法、装置和电子设备
CN110223751A (zh) * 2019-05-16 2019-09-10 平安科技(深圳)有限公司 基于医疗知识图谱的处方评价方法、系统及计算机设备
CN110245217A (zh) * 2019-06-17 2019-09-17 京东方科技集团股份有限公司 一种药品推荐方法、装置及电子设备

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112906395A (zh) * 2021-03-26 2021-06-04 平安科技(深圳)有限公司 药物关系提取方法、装置、设备及存储介质
CN112906395B (zh) * 2021-03-26 2023-08-15 平安科技(深圳)有限公司 药物关系提取方法、装置、设备及存储介质
CN117316373A (zh) * 2023-10-08 2023-12-29 医顺通信息科技(常州)有限公司 基于his的药品全流程监管系统及其方法
CN117316373B (zh) * 2023-10-08 2024-04-12 医顺通信息科技(常州)有限公司 基于his的药品全流程监管系统及其方法

Also Published As

Publication number Publication date
CN110781298B (zh) 2023-06-20
CN110781298A (zh) 2020-02-11
SG11202008417RA (en) 2020-12-30

Similar Documents

Publication Publication Date Title
WO2020220635A1 (fr) Procédé et appareil de classification de médicaments, dispositif informatique et support de stockage
CN111414393B (zh) 一种基于医学知识图谱的语义相似病例检索方法及设备
CN108831559B (zh) 一种中文电子病历文本分析方法与系统
WO2021047186A1 (fr) Procédé, appareil, dispositif, et support de stockage destinés au traitement de dialogue de consultation
US20200075135A1 (en) Trial planning support apparatus, trial planning support method, and storage medium
US20170083670A1 (en) Drug adverse event extraction method and apparatus
CN113345577B (zh) 诊疗辅助信息的生成方法、模型训练方法、装置、设备以及存储介质
CN112365939B (zh) 一种基于医疗健康大数据的数据治理方法及系统
WO2022121163A1 (fr) Procédé, appareil et dispositif d'identification de tendance de comportement d'utilisateur, et support de stockage
CN112820416A (zh) 一种重大传染病队列数据分型方法、分型模型及电子设备
Si et al. An OMOP CDM-based relational database of clinical research eligibility criteria
CN116910172A (zh) 基于人工智能的随访量表生成方法及系统
CN111797267A (zh) 一种医学图像检索方法及系统、电子设备、存储介质
CN115050442A (zh) 基于挖掘聚类算法的病种数据上报方法、装置及存储介质
Kaur et al. Image content based retrieval system using cosine similarity for skin disease images
Najadat et al. A classifier to detect abnormality in CT brain images
CN112071431B (zh) 基于深度学习和知识图谱的临床路径自动生成方法及系统
JP2001175724A (ja) 診療報酬明細書分析システム
CN115083550B (zh) 基于多源信息的病人相似度分类方法
CN106844325A (zh) 医疗信息处理方法和医疗信息处理装置
CN111667023B (zh) 获取目标类别的文章的方法和装置
Rao et al. COVID-19 detection method based on SVRNet and SVDNet in lung x-rays
CN114936153A (zh) 一种人工智能软件的图灵测试方法
Junior et al. A study of the influence of textual features in learning medical prior authorization
CN112016302B (zh) 分解住院行为的识别方法、装置、电子设备及存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19926795

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19926795

Country of ref document: EP

Kind code of ref document: A1