WO2020220635A1

WO2020220635A1 - Pharmaceutical drug classification method and apparatus, computer device and storage medium

Info

Publication number: WO2020220635A1
Application number: PCT/CN2019/117240
Authority: WO
Inventors: 陈娴娴; 阮晓雯; 徐亮
Original assignee: 平安科技（深圳）有限公司
Priority date: 2019-09-18
Filing date: 2019-11-11
Publication date: 2020-11-05
Also published as: CN110781298B; CN110781298A; SG11202008417RA

Abstract

A pharmaceutical drug classification method and apparatus, a computer device and a storage medium, comprising: on the basis of case information of a patient, obtaining target feature word vectors representing the state of illness of and the pharmaceutical drugs used by said patient; inputting the target feature word vectors into a preset pharmaceutical drug classification model, wherein said classification model is an unsupervised training model that implements clustering by means of calculating distances between different feature word vectors; on the basis of a cluster set of the pharmaceutical drugs used, said cluster set being outputted by said classification model, carrying out classification annotation on the pharmaceutical drugs used, wherein the classification annotation is at least one high-frequency word in said cluster set. The present classification method improves classification efficiency, and the use of case information therein further enhances the correspondence between the pharmaceutical drug and the state of illness, thus increasing the accuracy of classification results.

Description

Drug classification method, device, computer equipment and storage medium

This application claims the priority of a Chinese patent application filed with the Chinese Patent Office on September 18, 2019, the application number is 201910881521.5, and the invention title is "methods, devices, computer equipment and storage media for drug classification", the entire contents of which are incorporated by reference In this application.

Technical field

The embodiments of the application relate to the field of drug classification, in particular to a method, device, computer equipment and storage medium for drug classification.

Background technique

Drug classification management is an internationally accepted management method. It divides drugs into prescription drugs and non-prescription drugs and makes corresponding management regulations based on the safety and effectiveness principles of drugs, according to their varieties, specifications, indications, dosages and routes of administration. Its significance is to ensure the safety of people's medication.

In the prior art, the drug classification model mainly starts with a supervised model, which requires a large amount of labor costs to label samples in the previous period. The inventor realizes that manual labeling often has inaccurate labeling and imperfect classification. For this reason, a lot of manpower is required to perform maintenance operations such as adding and modifying categories. As a result, the classification of drugs is time-consuming and labor-intensive, and the accuracy of classification is also low.

Summary of the invention

The embodiments of the present application provide a drug classification method, device, computer equipment, and storage medium that can complete drug classification without marking.

In order to solve the above technical problems, a technical solution adopted in the embodiments created by this application is to provide a method for classifying drugs, including: obtaining, according to the user’s case information, a target feature word vector that characterizes the user’s condition and the use of drugs, wherein the The case information is text information, the target feature word vector includes a first word vector and a second word vector, the first word vector is obtained by extracting the text information through a neural network model, and the second word vector is The text information is statistically obtained after stop words are filtered; the target feature word vector is input into a preset drug classification model, where the drug classification model is clustered by calculating the distance between different feature word vectors Class unsupervised training model; classify and label the used drugs according to the cluster set of the used drugs output by the drug classification model, wherein the classification and label content is the cluster set of the used drugs At least one high-frequency word.

In order to solve the above technical problems, an embodiment of the present application also provides a medicine classification device, including: an acquisition module for acquiring a target feature word vector that characterizes the user's condition and the use of medicines according to the user's case information, wherein the case information is Text information, the target feature word vector includes a first word vector and a second word vector, the first word vector is obtained by extracting the text information through a neural network model, and the second word vector is obtained by comparing the text The information is filtered through stop words and then statistics are obtained; the processing module is used to input the target feature word vector into a preset drug classification model, where the drug classification model is calculated by calculating the distance between different feature word vectors An unsupervised training model for clustering; an execution module for classifying and labeling the used drugs according to the cluster set of the used drugs output by the drug classification model, wherein the classification is labeled as the use At least one high-frequency word is concentrated in the cluster of drugs.

In order to solve the above technical problems, an embodiment of the present application further provides a computer device including a memory and a processor. The memory stores computer-readable instructions. When the computer-readable instructions are executed by the processor, the The processor executes the steps of a method for classifying medicines. The method for classifying medicines includes the following steps: obtaining, according to the user’s case information, a target feature word vector that characterizes the user’s condition and the use of drugs, wherein the case information is text information , The target feature word vector includes a first word vector and a second word vector, the first word vector is obtained by extracting the text information through a neural network model, and the second word vector is obtained by extracting the text information The stop words are filtered and obtained by statistics; the target feature word vector is input into a preset drug classification model, where the drug classification model is an unsupervised training of clustering by calculating the distance between different feature word vectors Model; classify and label the used drugs according to the cluster set of the used drugs output by the drug classification model, wherein the classification and label content is at least one high-frequency word in the cluster set of the used drugs .

In order to solve the above technical problems, embodiments of the present application also provide a storage medium storing computer-readable instructions. When the computer-readable instructions are executed by one or more processors, one or more processors execute a Steps of a method for classifying medicines. The method for classifying medicines includes the following steps: obtaining a target feature word vector that characterizes the user’s condition and the use of drugs according to the user’s case information, wherein the case information is text information, and the target feature word The vector includes a first word vector and a second word vector, the first word vector is obtained by extracting the text information through a neural network model, and the second word vector is calculated by filtering the text information by stop words Obtain; input the target feature word vector into a preset drug classification model, where the drug classification model is an unsupervised training model that clusters by calculating the distance between different feature word vectors; according to the drug The cluster set of used medicines output by the classification model classifies and annotates the used medicines, wherein the content of the classification and annotation is at least one high-frequency word in the cluster set of the used medicines.

The embodiments of the application can improve the efficiency of drug classification, and the use of case information can further strengthen the correspondence between drugs and disease conditions, and improve the accuracy of the classification results.

Description of the drawings

Fig. 1 is a schematic diagram of the basic flow of the method for classifying drugs according to an embodiment of the application;

FIG. 2 is a schematic diagram of a process of collecting a first word vector through a neural network model according to an embodiment of the application;

3 is a schematic diagram of the process of extracting word vectors through keyword sets according to an embodiment of the application;

FIG. 4 is a schematic diagram of a process of generating a first-level cluster set according to an embodiment of the application;

FIG. 5 is a schematic diagram of a process of generating a secondary cluster set according to an embodiment of the application;

FIG. 6 is a schematic diagram of a process of generating a three-level cluster set according to an embodiment of the application;

FIG. 7 is a schematic diagram of three-level classification according to an embodiment of this application;

FIG. 8 is a schematic diagram of the basic structure of a drug classification device according to an embodiment of the application;

Fig. 9 is a block diagram of the basic structure of a computer device according to an embodiment of the application.

Detailed ways

Please refer to Fig. 1 for details. Fig. 1 is a schematic diagram of the basic flow of the drug classification method in this embodiment.

As shown in Figure 1, a drug classification method includes:

S1100. Acquire a target feature word vector that characterizes the user's condition and medication use according to the user's case information, where the case information is text information, the target feature word vector includes a first word vector and a second word vector, and the first A word vector is obtained by extracting the text information through a neural network model, and the second word vector is obtained by performing statistics after filtering the text information by stop words;

When the user seeks medical treatment in a hospital or clinic, the user’s behavior information is recorded throughout the entire process. The recorded behavior information includes: the user’s medical condition, the use of drugs, and the user’s laboratory results. The above-mentioned behavior information is defined as the user’s medical condition. . The foregoing medical condition information is all text information, but is not limited to this. According to different specific application scenarios, in some embodiments, the medical condition information further includes: picture information and sound information.

After obtaining the case information, extract the target feature word vector in the case information, and the target feature word vector is vector information that characterizes the user's condition and the use of drugs. The method of extracting the target feature word vector can be used to extract the feature vector through a neural network model that has been trained to a convergent state. In some embodiments, the target feature word vector can be extracted by calculating the word frequency of the keywords in the case information. In some embodiments, the target feature word vector is first extracted through a neural network model, and then the word frequency statistics method is used for calculation, and finally the results obtained by the two calculation methods are combined to obtain the target feature word vector.

S1200. Input the target feature word vector into a preset drug classification model, where the drug classification model is an unsupervised training model that performs clustering by calculating the distance between different feature word vectors;

The target feature word vector is input into a preset drug classification model, where the drug classification model is an unsupervised training model for clustering by calculating the distance between different feature word vectors.

In this embodiment, the drug classification model adopts an unsupervised model, and an unsupervised training model is used to cluster feature word vectors. The unsupervised training model mainly calculates the inter-class distance between different feature word vectors and sets a distance with a measurement property. Threshold, cluster the feature word vectors whose distance between classes is less than the distance threshold to generate a cluster set. By clustering a large number of feature word vectors including the target feature word vector to form multiple cluster sets, each different cluster set is a classification category of the medicine.

The calculation of the distance between classes is actually calculating the similarity of the condition information of different drugs. The smaller the distance between classes, the closer the efficacy of different drugs. The greater the distance between classes, the greater the difference in efficacy of different drugs. Therefore, different classification categories can achieve different cures or curative effects.

In some embodiments, to further refine the classification and analogy of drugs, the classification categories are divided into different levels, and after the first level division is completed, further classification is performed in different clusters. The method adopted is to reduce the value of the distance threshold, so that the feature word vectors in the cluster set are further distinguished. At the same time, reducing the parameter value of the effective point spacing in different feature word vectors can make the intra-class distance of different feature word vectors more converge, and the convergence of the intra-class distance will further increase the inter-class distance between feature word vectors. Therefore, the differentiation between different feature word vectors in the cluster can be further increased, which provides a good condition for further subdividing the categories in the cluster.

According to the above method, as long as the parameter values of the distance threshold and the effective point spacing are continuously adjusted, further refined classification can be performed in clusters of different levels to form classification categories with attribute distribution. In some embodiments, the cluster set is divided into 3 levels, but not limited to, according to different specific application scenarios, the cluster set can be divided into: level 1, level 2, level 4, level 5 or more. .

S1300. Classify and label the used drugs according to the cluster set of the used drugs output by the drug classification model, wherein the classification label is at least one high-frequency word in the cluster set of the used drugs.

According to the cluster set output by the drug classification model, each cluster set and the drugs in the last set are labeled. Wherein, the labeling method of the cluster set is: extract the word with the highest frequency in the case information of each drug in the cluster set as the label name of the cluster set. In some embodiments, when there are multiple If the label name is yes, it is selected in turn according to the sorting result of frequency of occurrence. For the labeling of the drug name, the drug name is directly extracted from the case information for labeling.

In the above embodiment, when classifying drugs, the name of the drug and the medical condition information corresponding to the drug can be obtained by collecting the user's case information, the medical condition information corresponding to the drug name is converted into the target feature word vector, and the target The feature word vector is input as input data to the unsupervised drug classification model. The drug classification model clusters drugs that can cure the same or similar conditions together to form a cluster category, and the cluster category can become a drug A category of classification. Finally, the drug classification is completed by labeling the names of the drugs in the classification category. This classification method can improve the efficiency of drug classification, and the use of case information can further strengthen the correspondence between drugs and disease conditions, and improve the accuracy of the classification results.

In some embodiments, it is necessary to perform feature extraction on case information through a neural network model. Please refer to FIG. 2, which is a schematic diagram of the process of collecting the first word vector through the neural network model in this embodiment.

As shown in Figure 2, S1100 includes:

S1111, converting the case information into a behavior vector set;

The case information is transformed into a vector set that can be recognized or processed by the neural network model. The method used is to convert the case information into a vector set through the word2vec model. However, it is not limited to this. According to different specific application scenarios, in some embodiments, the case information can also be vectorized through TF-IDF (term frequency—inverse document frequency) technology.

S1112, input the behavior vector set into a preset feature extraction model, where the feature extraction model is a neural network model that is pre-trained to a convergent state and is used to extract behavior vectors to represent user behavior vectors;

The converted vector set is input into a preset feature extraction model, where the feature extraction model is pre-trained to a convergent state, and is used to extract a neural network model that represents user behavior vectors in a set of behavior vectors.

In this embodiment, the feature extraction model is used to extract word vectors associated with user medical information and drug information in the vector set.

In order for the feature extraction model to accurately extract the word vectors associated with the user's medical condition information and drug information, the feature extraction model needs to be trained. The training method is: collect training sample sets, which are composed of vector sets after conversion of several case information, manually calibrate the word vectors in each vector set, and then input the labeled vector sets into the neural network model in turn. After the neural network model extracts the excitation word vector, it calculates the distance between the excitation word vector and the label word vector. If the distance is greater than the set distance threshold, the weight of the neural network model is calibrated through the back propagation algorithm. After the calibration is completed, repeat the above steps until the distance between the excited word vector and the labeled word vector is less than the set distance threshold, then the vector set training is passed, and the vector set in the training sample set is trained by the above method until the When the accuracy of extracting word vectors by the neural network model is greater than a set value (for example, 98%), the training ends, and the neural network model after the training is completed is a feature extraction model.

The feature extraction model trained to the convergent state can accurately extract the word vector associated with the user's condition information and drug information in the vector set, and the word vector is the user behavior vector.

S1113. Read the user behavior vector output by the feature extraction model, and define the user behavior vector as a first word vector.

Read the user behavior vector output by the feature extraction model. Since the user behavior vector represents the word vector associated with the user's medical condition information and drug information, the extracted user behavior vector can be used as the input data of the drug classification model. In this embodiment, the user behavior vector is defined as the first word vector.

The neural network model trained to convergence can quickly extract word vectors that record key information, simplifying the data processing procedures of the drug classification model, and improving the processing efficiency of the drug classification model.

In some embodiments, in order to further collect the user's medical condition information and drug information recorded in the case information, and reduce the omission rate of key information, it is necessary to further extract the key information. Please refer to FIG. 3, which is a schematic diagram of the process of extracting word vectors through keyword sets in this embodiment.

As shown in Figure 3, after S1113, it includes:

S1121, filter the case information through a preset stop word list to generate a keyword set;

In this embodiment, in order to further filter out information irrelevant to the user's condition information and drug information in the case information, it is necessary to use stop words to filter the case information, and the stop words are words that are filtered out.

Create a stop word list and record stop words obtained through statistics. For example, if words of verbs, adverbs, and adjectives are set as stop words, after filtering through the stop word list, the case information is removed For the stop words with the above-mentioned part of speech, the case information after the stop words are removed generates a keyword set, which records the user's condition information and drug information in the keyword set.

S1122, Count the word frequency of each keyword in the keyword set and the inverse document frequency of each keyword;

After the keyword set is obtained by filtering, the word frequency of each keyword in the keyword set is calculated. Among them, the calculation method of the word frequency is:

After calculating the word frequency of each keyword, calculate the inverse document frequency of each keyword. The inverse document frequency is used to determine the importance of each keyword. Generally, the size of the inverse document frequency is inversely proportional to the commonness of a word, and the inverse document frequency is The calculation method is:

S1123: Calculate the priority value of each keyword according to the word frequency and the inverse document frequency;

After calculating the word frequency and inverse document frequency of each keyword, multiply the word frequency and inverse document frequency to obtain the priority value of each keyword. According to the priority value, the priority of each keyword is sorted by descending power, and according to the actual It is necessary to select the top keywords as the keywords to be converted. For example, extract the top 20 keywords as the keywords to be converted. However, the determination of the number of keywords to be converted is not limited to this, according to specific application scenarios In some embodiments, the number of keywords to be converted can be any value.

S1124: Generate the second word vector according to the priority value of each keyword.

Sort keywords in descending order of priority according to their priority values, and select the top keywords as the keywords to be converted according to actual needs. Through the word2vec model or TF-IDF technology, the keywords to be converted that have been filtered by the priority value are converted into the second word vector.

In some embodiments, the first word vector is extracted by the neural network model. Because the relationship between the word vector extracted by the neural network model and the text information, it essentially carries people’s subjective will, and through repeated orientation Training and learning are obtained, but the neural network model has the defect that it is difficult to converge during the cross-training of multiple association relationships. Therefore, the extracted first word vector will have the problem of insufficient comprehensiveness of the extracted word vector or omission of the keyword vector. The second word vector is calculated based on the filtering of stop words, without any personal will during the statistics, and can most directly reflect the distribution of each keyword, and extract the word vector more comprehensively but without emphasis. Merging the first word vector with the second word vector, the target feature word vector generated after the merging has more comprehensive data, which can not only highlight the feature word vector that people pay attention to, but also fully integrate the feature word vector existing in the customer view, so that the extracted The data is comprehensive and focused, and comprehensive and focused data is conducive to improving the accuracy of the drug classification model. See step S1131.

S1131. Combine the first word vector and the second word vector to generate the target feature word vector.

The method of merging the first word vector and the second word vector is: add the word vector matrix composed of the first word vector and the word vector matrix composed of the second word vector, and the result of the operation is the target feature word vector The vector matrix is the input data of the drug classification model.

In some embodiments, the drug classification model generates a first-level cluster set, and the cluster set of the target feature word vector needs to be judged by calculating the Euclidean distance between the target feature word vector and different feature word vectors. Please refer to FIG. 4, which is a schematic diagram of the process of generating a first-level cluster set in this embodiment.

As shown in Figure 4, S1200 includes:

S1211, calculate the first Euclidean distance between the target feature word vector and different feature word vectors;

When the drug classification model classifies the target feature word vector, the distance between the target feature word vector and other feature word vectors needs to be calculated. Specifically, the Euclidean distance between the target feature word vector and different feature word vectors is calculated. Euclidean distance is collectively referred to as the first Euclidean distance. However, it is not limited to this. In some embodiments, the calculation method is to calculate the Mahalanobis distance or the cosine distance between the target feature word vector and different feature word vectors.

S1212. Compare the first Euclidean distance with a preset first distance threshold;

The first Euclidean distance between the target feature word vector and the different feature word vectors is compared with the set first distance threshold. Wherein, the first distance threshold is a threshold for measuring whether the feature word vectors meet the first screening condition, for example, the value of the first distance threshold is 0.5.

By comparing the first Euclidean distance with the preset first distance threshold, it can be judged which feature word vector or type of feature word vector should be clustered with the target feature word vector.

S1213: When the first Euclidean distance is less than the first distance threshold, cluster the target feature vector into a cluster set represented by the first Euclidean distance to generate a first-level cluster set.

By comparing and judging that when the first Euclidean distance between the target feature word vector and a certain feature word vector is less than the first distance threshold, it is proved that the target feature word vector should be clustered into the cluster set where the feature word vector is located. After clustering all the target feature word vectors of the case information, a first-level cluster set is generated, and the first-level cluster set is composed of at least one cluster set.

In some embodiments, the drug classification model generates a secondary cluster set, and the secondary cluster set needs to be further refined clustering on the basis of the primary cluster set. Please refer to FIG. 5, which is a schematic diagram of the process of generating a secondary cluster set in this embodiment.

As shown in Figure 5, after S1213, it includes:

S1221. Correct the parameter value of the effective point spacing in the drug classification model to generate a first parameter value, and calculate the second parameter value between the target feature word vector and different feature word vectors in the first-level clustering set. Distance

Before the second-level clustering of the drug classification model, the parameters of the effective point spacing in the drug classification model need to be adjusted. The effective point spacing refers to the distance between classes that are not ignored in each feature word vector. Due to the efficiency of data calculation, Before calculating the inter-class distance, you need to filter the intra-class distance. The filtering method is to set the parameter value of the effective point spacing. The inter-class distance that is less than the effective point spacing in the inter-class distance will be judged as invalid. Therefore, , Decreasing the value of the parameter value of the effective point spacing will increase the diversity of the distance within the class, reveal more detailed parts of each feature word vector, and increase the difference between different feature word vectors in the same cluster. Conducive to two-level clustering. The parameter value of the effective point spacing after correction is the first parameter value. The value of the first parameter value is smaller than the parameter value of the effective point spacing set by the drug classification model before a cluster set.

After setting the first parameter value, the drug classification model performs a secondary clustering in each cluster in the primary clustering set. The second-level clustering method is: in the cluster set where the target feature word vector is located, the second Euclidean distance between the target feature word vector and other feature word vectors is calculated. However, it is not limited to this. In some embodiments, the calculation of the second Euclidean distance can be modified to calculate the Mahalanobis distance or the cosine distance between the target feature word vector and different feature word vectors.

S1222. Compare the second Euclidean distance with a preset second distance threshold, where the second distance threshold is less than the first distance threshold;

The second Euclidean distance between the target feature word vector and different feature word vectors is compared with the set second distance threshold. Wherein, the second distance threshold is a threshold for measuring whether the feature word vectors meet the second screening condition, for example, the value of the second distance threshold is 0.1.

By comparing the second Euclidean distance with the preset second distance threshold, it can be judged that the target feature word vector should be clustered with which feature word vector or type of feature word vector in the cluster set where the target feature word vector is located.

S1223: When the second Euclidean distance is less than the second distance threshold, cluster the target feature vector into a cluster set represented by the second Euclidean distance to generate a secondary cluster set.

It is judged by comparison that when the second Euclidean distance between the target feature word vector and a certain feature word vector is less than the second distance threshold, it is proved that the target feature word vector should be clustered into the cluster set where the feature word vector is located. After clustering the feature word vectors in all cluster sets, a secondary cluster set is generated, and the secondary cluster set is composed of at least one cluster set.

In some embodiments, the drug classification model generates a three-level cluster set, and the three-level cluster set needs to be further refined clustering on the basis of the two-level cluster set. Please refer to FIG. 6, which is a schematic diagram of the process of generating a three-level cluster set in this embodiment.

As shown in Figure 6, after S1231, it includes:

S1231. Correct the parameter value of the effective point spacing in the drug classification model to generate a second parameter value, and calculate the third parameter value between the target feature word vector and different feature word vectors in the secondary cluster set Distance, wherein the second parameter value is smaller than the first parameter value;

Before the drug classification model performs three-level clustering, the parameters of the effective point spacing in the drug classification model need to be adjusted. The effective point spacing refers to the distance between classes in each feature word vector that has not been ignored. Due to the efficiency of data calculation, Before calculating the inter-class distance, you need to filter the intra-class distance. The filtering method is to set the parameter value of the effective point spacing. The inter-class distance that is less than the effective point spacing in the inter-class distance will be judged as invalid. Therefore, , Decreasing the value of the parameter value of the effective point spacing will increase the diversity of the distance within the class, reveal more detailed parts of each feature word vector, and increase the difference between different feature word vectors in the same cluster. Conducive to three-level clustering. The parameter value of the effective point spacing after correction is the second parameter value. The value of the second parameter value is smaller than the value of the first parameter.

After setting the second parameter value, the drug classification model performs three-level clustering in each cluster in the second-level cluster set. The three-level clustering method is: in the cluster set where the target feature word vector is located, the third Euclidean distance between the target feature word vector and other feature word vectors is calculated. But it is not limited to this. In some embodiments, the calculation of the third Euclidean distance can be modified to calculate the Mahalanobis distance or the cosine distance between the target feature word vector and different feature word vectors.

S1232. Compare the third Euclidean distance with a preset third distance threshold, where the third distance threshold is smaller than the second distance threshold;

The third Euclidean distance between the target feature word vector and different feature word vectors is compared with the set third distance threshold. Wherein, the third distance threshold is a threshold for measuring whether the feature word vectors meet the third filtering condition, for example, the value of the third distance threshold is 0.05.

Comparing the third Euclidean distance with the preset third distance threshold can determine the clustering set where the target feature word vector is located, and which feature word vector or type of feature word vector should be clustered with the target feature word vector.

S1233: When the third Euclidean distance is less than the third distance threshold, cluster the target feature vector into a cluster set represented by the third Euclidean distance to generate a three-level cluster set.

It is judged by comparison that when the third Euclidean distance between the target feature word vector and a certain feature word vector is less than the third distance threshold, it is proved that the target feature word vector should be clustered into the cluster set where the feature word vector is located. After clustering the feature word vectors in all clusters, a three-level cluster set is generated, and the three-level cluster set is composed of at least one cluster set. So far, the three-level classification of drugs is completed, but the setting of the classification level is not limited to this. In some embodiments, the parameter value of the effective effective point spacing and the distance threshold can be further corrected to further refine the classification.

Please refer to FIG. 7, which is a schematic diagram of the three-level classification in this embodiment.

As shown in Figure 7, the classification of drugs is divided into three levels, namely: a first-level cluster set 11, a second-level cluster set 12, and a third-level cluster set 13. The cluster sets of three different levels are arranged in a dendrogram.

In order to solve the above technical problem, the embodiment of the present application also provides a medicine classification device.

Please refer to FIG. 8 for details. FIG. 8 is a schematic diagram of the basic structure of the medicine classification device of this embodiment.

As shown in FIG. 8, a medicine classification device includes: an acquisition module 2100, a processing module 2200, and an execution module 2300. Wherein, the acquisition module 2100 is configured to acquire the target feature word vector that characterizes the user's condition and the use of drugs according to the user's case information, where the case information is text information, and the target feature word vector includes a first word vector and a second word vector, The first word vector is obtained by extracting the text information through a neural network model, and the second word vector is obtained by performing stop word filtering on the text information and then performing statistics; the processing module 2200 is used to obtain the target feature word vector Input to the preset drug classification model, where the drug classification model is an unsupervised training model that clusters by calculating the distance between different feature word vectors; the execution module 2300 is used to cluster the drugs used according to the output of the drug classification model. The cluster is used to label the classification information of the used drugs, wherein the content of the classification and annotation is at least one high-frequency word in the cluster set of the used drugs.

When the drug classification device classifies drugs, it can obtain the name of the drug and the disease information corresponding to the drug by collecting the user's case information, convert the disease information corresponding to the drug name into a target feature word vector, and convert the target feature word The vector is input as input data into the unsupervised drug classification model. The drug classification model clusters drugs that can cure the same or similar conditions together to form a cluster category. This cluster category can become a drug classification model. A category. Finally, the drug classification is completed by labeling the names of the drugs in the classification category. This classification method can improve the efficiency of drug classification, and the use of case information can further strengthen the correspondence between drugs and disease conditions, and improve the accuracy of the classification results.

In some embodiments, the target feature word vector includes: a first word vector, and the medicine classification device includes: a first conversion submodule, a first processing submodule, and a first execution submodule. Among them, the first conversion sub-module is used to convert the case information into a behavior vector set; the first processing sub-module is used to input the behavior vector set into a preset feature extraction model, where the feature extraction model is pre-trained to a convergent state , Is used to extract the neural network model of the behavior vector centrally representing the user behavior vector; the first execution sub-module is used to read the user behavior vector output by the feature extraction model, and define the user behavior vector as the first word vector.

In some embodiments, the target feature word vector includes: a second word vector, and the drug classification device includes: a first filtering submodule, a second processing submodule, a first calculation submodule, and a second execution submodule. Among them, the first filtering submodule is used to filter case information through a preset stop word list to generate a keyword set; the second processing submodule is used to count the word frequency of each keyword in the keyword set and the inverse document of each keyword Frequency; the first calculation sub-module is used to calculate the priority value of each keyword by word frequency and inverse document frequency; the second execution sub-module is used to generate the second word vector according to the priority value of each keyword.

In some embodiments, the drug classification device includes: a first merging sub-module for merging the first word vector and the second word vector to generate the target feature word vector.

In some embodiments, the drug classification device includes: a first calculation submodule, a first comparison submodule, and a third execution submodule. Among them, the first calculation sub-module is used to calculate the first Euclidean distance between the target feature word vector and different feature word vectors; the first comparison sub-module is used to compare the first Euclidean distance with a preset first distance threshold Perform comparison; the third execution sub-module is used to cluster the target feature vector to the cluster set represented by the first Euclidean distance to generate a first-level cluster set when the first Euclidean distance is less than the first distance threshold.

In some embodiments, the drug classification device includes: a second calculation submodule, a second comparison submodule, and a fourth execution submodule. Among them, the second calculation sub-module is used to correct the parameter value of the effective point spacing in the drug classification model to generate the first parameter value, and calculate the second parameter between the target feature word vector and different feature word vectors in the first-level clustering set. Euclidean distance; the second comparison submodule is used to compare the second Euclidean distance with a preset second distance threshold, where the second distance threshold is less than the first distance threshold; the fourth execution submodule is used for When the second Euclidean distance is less than the second distance threshold, cluster the target feature vector to the cluster set represented by the second Euclidean distance to generate a secondary cluster set.

In some embodiments, the drug classification device includes: a third calculation submodule, a third comparison submodule, and a fifth execution submodule. Among them, the third calculation sub-module is used to correct the parameter value of the effective point spacing in the drug classification model to generate the second parameter value, and calculate the third parameter value between the target feature word vector and different feature word vectors in the secondary clustering set. Euclidean distance, where the second parameter value is less than the first parameter value; the third comparison sub-module is used to compare the third Euclidean distance with a preset third distance threshold, where the third distance threshold is less than the second Distance threshold; the fifth execution submodule is used to cluster the target feature vector into the cluster set represented by the third Euclidean distance to generate a three-level cluster set when the third Euclidean distance is less than the third distance threshold.

In order to solve the above technical problems, the embodiments of the present application also provide computer equipment. Please refer to FIG. 9 for details. FIG. 9 is a block diagram of the basic structure of the computer device in this embodiment.

As shown in Figure 9, a schematic diagram of the internal structure of the computer equipment. The computer equipment includes a processor, a storage medium, a memory, and a network interface connected through a system bus. The storage medium may be volatile or non-volatile. The storage medium of the computer device stores an operating system, a database, and computer-readable instructions. The database may store control information sequences, which are readable by the computer. When the instructions are executed by the processor, the processor can realize a medicine classification method. The processor of the computer equipment is used to provide calculation and control capabilities, and supports the operation of the entire computer equipment. Computer readable instructions may be stored in the memory of the computer device, and when the computer readable instructions are executed by the processor, the processor can make the processor execute a medicine classification method. The network interface of the computer device is used to connect and communicate with the terminal. Those skilled in the art can understand that the structure shown in FIG. 9 is only a block diagram of part of the structure related to the solution of the present application, and does not constitute a limitation on the computer equipment to which the solution of the present application is applied. The specific computer equipment may Including more or fewer parts than shown in the figure, or combining some parts, or having a different arrangement of parts.

In this embodiment, the processor is used to execute the specific functions of the acquisition module 2100, the processing module 2200, and the execution module 2300 in FIG. 8, and the memory stores the program codes and various data required to execute the above modules. The network interface is used for data transmission between user terminals or servers. The memory in this embodiment stores the program codes and data required to execute all the sub-modules in the medicine classification device, and the server can call the program codes and data of the server to execute the functions of all the sub-modules.

When the computer equipment classifies drugs, it can obtain the name of the drug and the medical condition information corresponding to the drug by collecting the user's case information, convert the medical condition information corresponding to the drug name into the target feature word vector, and convert the target feature word vector As input data, it is input into an unsupervised drug classification model. The drug classification model clusters drugs that can cure the same or similar conditions together to form a cluster category. This cluster category can become a drug classification category. Finally, the drug classification is completed by labeling the names of the drugs in the classification category. This classification method can improve the efficiency of drug classification, and the use of case information can further strengthen the correspondence between drugs and disease conditions, and improve the accuracy of the classification results.

The present application also provides a storage medium storing computer-readable instructions. When the computer-readable instructions are executed by one or more processors, the one or more processors execute the steps of the drug classification method in any of the foregoing embodiments.

A person of ordinary skill in the art can understand that all or part of the processes in the above-mentioned embodiment methods can be implemented by instructing relevant hardware through a computer program. The computer program can be stored in a computer readable storage medium. When executed, it may include the processes of the above-mentioned method embodiments. Among them, the aforementioned storage medium may be a storage medium such as a magnetic disk, an optical disc, a read-only memory (Read-Only Memory, ROM), or a random access memory (Random Access Memory, RAM), etc.

Claims

A method for classifying medicines, including:

According to the user's case information, a target feature word vector that characterizes the user's condition and the use of drugs is acquired, where the case information is text information, and the target feature word vector includes a first word vector and a second word vector, and the first word The vector is obtained by extracting the text information through a neural network model, and the second word vector is obtained by performing statistics after filtering the text information with stop words;

Inputting the target feature word vector into a preset drug classification model, where the drug classification model is an unsupervised training model that performs clustering by calculating the distance between different feature word vectors;

Classify and label the used drugs according to the cluster set of the used drugs output by the drug classification model, wherein the content of the classification and annotation is at least one high-frequency word in the cluster set of the used drugs.
The method for classifying medicines according to claim 1, wherein said acquiring, according to the user's case information, the target feature word vector that characterizes the user's condition and the use of medicines comprises:

Converting the case information into a behavior vector set;

Inputting the behavior vector set into a preset feature extraction model, where the feature extraction model is pre-trained to a convergent state, and is used to extract a neural network model that represents a user behavior vector in a collection of behavior vectors;

Read the user behavior vector output by the feature extraction model, and define the user behavior vector as the first word vector.
The medicine classification method according to claim 2, after reading the user behavior vector output by the feature extraction model and defining the user behavior vector as a first word vector, the method comprises:

Filtering the case information through a preset stop word list to generate a keyword set;

Count the word frequency of each keyword in the keyword set and the inverse document frequency of each keyword;

Calculating the priority value of each keyword according to the word frequency and the inverse document frequency;

The second word vector is generated according to the priority value of each keyword.
The method for classifying medicines according to claim 3, after generating the second word vector according to the priority value of each keyword, the method includes:

Combining the first word vector and the second word vector to generate the target feature word vector.
The medicine classification method according to claim 1, wherein the inputting the target feature word vector into a preset medicine classification model comprises:

Calculating the first Euclidean distance between the target feature word vector and different feature word vectors;

Comparing the first Euclidean distance with a preset first distance threshold;

When the first Euclidean distance is less than the first distance threshold, clustering the target feature vector into a cluster set represented by the first Euclidean distance to generate a first-level cluster set.
The method for classifying medicines according to claim 5, wherein when the Euclidean distance is greater than the first distance threshold, clustering the target feature vector to clusters characterized by the Euclidean distance to generate a first-level cluster After the set, include:

The parameter value of the effective point spacing in the drug classification model is corrected to generate a first parameter value, and the second Euclidean distance between the target feature word vector and different feature word vectors is calculated in the first-level cluster set ；

Comparing the second Euclidean distance with a preset second distance threshold, where the second distance threshold is smaller than the first distance threshold;

When the second Euclidean distance is less than the second distance threshold, cluster the target feature vector into a cluster set represented by the second Euclidean distance to generate a secondary cluster set.
The method for classifying medicines according to claim 6, wherein when the second Euclidean distance is greater than the second distance threshold, clustering the target feature vector to a cluster characterized by the second Euclidean distance is generated After the secondary clustering set, include:

The parameter value of the effective point spacing in the drug classification model is corrected to generate a second parameter value, and the third Euclidean distance between the target feature word vector and different feature word vectors is calculated in the secondary cluster set , Wherein the second parameter value is less than the first parameter value;

Comparing the third Euclidean distance with a preset third distance threshold, where the third distance threshold is smaller than the second distance threshold;

When the third Euclidean distance is less than the third distance threshold, cluster the target feature vector into a cluster set represented by the third Euclidean distance to generate a three-level cluster set.
A medicine classification device includes:

The obtaining module is used to obtain the target feature word vector that characterizes the user's condition and the use of drugs according to the user's case information, wherein the case information is text information, and the target feature word vector includes a first word vector and a second word vector The first word vector is obtained by extracting the text information through a neural network model, and the second word vector is obtained by performing statistics after filtering the text information by stop words;

A processing module, configured to input the target feature word vector into a preset drug classification model, where the drug classification model is an unsupervised training model that performs clustering by calculating the distance between different feature word vectors;

The execution module is configured to classify and label the used drugs according to the cluster set of the used drugs output by the drug classification model, wherein the classification label is at least one high frequency in the cluster set of the used drugs Words.
A computer device includes a memory and a processor. The memory stores computer-readable instructions. When the computer-readable instructions are executed by the processor, the processor executes a method for classifying medicines. A drug classification method includes the following steps:

According to the user's case information, a target feature word vector that characterizes the user's condition and the use of drugs is acquired, where the case information is text information, and the target feature word vector includes a first word vector and a second word vector, and the first word The vector is obtained by extracting the text information through a neural network model, and the second word vector is obtained by performing statistics after filtering the text information with stop words;

Inputting the target feature word vector into a preset drug classification model, where the drug classification model is an unsupervised training model that performs clustering by calculating the distance between different feature word vectors;

Classify and label the used drugs according to the cluster set of the used drugs output by the drug classification model, wherein the content of the classification and annotation is at least one high-frequency word in the cluster set of the used drugs.
8. The computer device according to claim 9, wherein said acquiring a target feature word vector that characterizes the user's condition and the use of drugs according to the user's case information comprises:

Converting the case information into a behavior vector set;

Inputting the behavior vector set into a preset feature extraction model, where the feature extraction model is pre-trained to a convergent state, and is used to extract a neural network model that represents a user behavior vector in a collection of behavior vectors;

Read the user behavior vector output by the feature extraction model, and define the user behavior vector as the first word vector.
The computer device according to claim 10, after the reading the user behavior vector output by the feature extraction model and defining the user behavior vector as a first word vector, the method comprises:

Filtering the case information through a preset stop word list to generate a keyword set;

Count the word frequency of each keyword in the keyword set and the inverse document frequency of each keyword;

Calculating the priority value of each keyword according to the word frequency and the inverse document frequency;

The second word vector is generated according to the priority value of each keyword.
11. The computer device according to claim 11, after the generating the second word vector according to the priority value of each keyword, comprising:

Combining the first word vector and the second word vector to generate the target feature word vector.
The computer device according to claim 9, wherein the inputting the target feature word vector into a preset medicine classification model comprises:

Calculating the first Euclidean distance between the target feature word vector and different feature word vectors;

Comparing the first Euclidean distance with a preset first distance threshold;

When the first Euclidean distance is less than the first distance threshold, clustering the target feature vector into a cluster set represented by the first Euclidean distance to generate a first-level cluster set.
The computer device according to claim 13, wherein when the Euclidean distance is greater than the first distance threshold, cluster the target feature vector to a cluster set represented by the Euclidean distance to generate a first-level cluster set After that, include:

The parameter value of the effective point spacing in the drug classification model is corrected to generate a first parameter value, and the second Euclidean distance between the target feature word vector and different feature word vectors is calculated in the first-level cluster set ；

Comparing the second Euclidean distance with a preset second distance threshold, where the second distance threshold is less than the first distance threshold;

When the second Euclidean distance is less than the second distance threshold, cluster the target feature vector into a cluster set represented by the second Euclidean distance to generate a secondary cluster set.
The computer device according to claim 14, wherein when the second Euclidean distance is greater than the second distance threshold, cluster the target feature vector into a cluster represented by the second Euclidean distance to generate two After the first-level clustering set, include:

The parameter value of the effective point spacing in the drug classification model is corrected to generate a second parameter value, and the third Euclidean distance between the target feature word vector and different feature word vectors is calculated in the secondary cluster set , Wherein the second parameter value is less than the first parameter value;

Comparing the third Euclidean distance with a preset third distance threshold, where the third distance threshold is smaller than the second distance threshold;

When the third Euclidean distance is less than the third distance threshold, cluster the target feature vector into a cluster set represented by the third Euclidean distance to generate a three-level cluster set.
A storage medium storing computer-readable instructions. When the computer-readable instructions are executed by one or more processors, the one or more processors execute a method for classifying drugs. The method for classifying drugs includes The following steps:

According to the user's case information, a target feature word vector that characterizes the user's condition and the use of drugs is acquired, where the case information is text information, and the target feature word vector includes a first word vector and a second word vector, and the first word The vector is obtained by extracting the text information through a neural network model, and the second word vector is obtained by performing statistics after filtering the text information with stop words;

Inputting the target feature word vector into a preset drug classification model, where the drug classification model is an unsupervised training model that performs clustering by calculating the distance between different feature word vectors;

Classify and label the used drugs according to the cluster set of the used drugs output by the drug classification model, wherein the content of the classification and annotation is at least one high-frequency word in the cluster set of the used drugs.
The storage medium according to claim 16, wherein the obtaining of the target feature word vector characterizing the user's condition and the use of drugs according to the user's case information comprises:

Converting the case information into a behavior vector set;

Inputting the behavior vector set into a preset feature extraction model, where the feature extraction model is pre-trained to a convergent state, and is used to extract a neural network model that represents a user behavior vector in a collection of behavior vectors;

Read the user behavior vector output by the feature extraction model, and define the user behavior vector as the first word vector.
The storage medium according to claim 17, after the reading the user behavior vector output by the feature extraction model and defining the user behavior vector as a first word vector, the method comprises:

Filtering the case information through a preset stop word list to generate a keyword set;

Count the word frequency of each keyword in the keyword set and the inverse document frequency of each keyword;

Calculating the priority value of each keyword according to the word frequency and the inverse document frequency;

The second word vector is generated according to the priority value of each keyword.
The storage medium according to claim 18, after the generating the second word vector according to the priority value of each keyword, the method comprises:

Combining the first word vector and the second word vector to generate the target feature word vector.
The storage medium according to claim 16, said inputting the target feature word vector into a preset medicine classification model comprises:

Calculating the first Euclidean distance between the target feature word vector and different feature word vectors;

Comparing the first Euclidean distance with a preset first distance threshold;

When the first Euclidean distance is less than the first distance threshold, clustering the target feature vector into a cluster set represented by the first Euclidean distance to generate a first-level cluster set.