CN110781298A

CN110781298A - Medicine classification method and device, computer equipment and storage medium

Info

Publication number: CN110781298A
Application number: CN201910881521.5A
Authority: CN
Inventors: 陈娴娴; 阮晓雯; 徐亮
Original assignee: Ping An Technology Shenzhen Co Ltd
Current assignee: Ping An Technology Shenzhen Co Ltd
Priority date: 2019-09-18
Filing date: 2019-09-18
Publication date: 2020-02-11
Anticipated expiration: 2039-09-18
Also published as: CN110781298B; SG11202008417RA; WO2020220635A1

Abstract

The embodiment of the invention discloses a medicine classification method, a medicine classification device, computer equipment and a storage medium, wherein the medicine classification method comprises the following steps: acquiring target characteristic word vectors representing the state of illness of the user and the medicine according to the case information of the user; inputting the target feature word vectors into a preset medicine classification model, wherein the medicine classification model is an unsupervised training model for clustering by calculating the distance between different feature word vectors; and performing classification labeling on the used medicines according to the clustering set of the used medicines output by the medicine classification model, wherein the classification labeling is at least one high-frequency word in the clustering set of the used medicines. The classification mode can improve the efficiency of medicine classification, and the corresponding relation between medicines and the illness state can be further strengthened by adopting case information, so that the accuracy of the classification result is improved.

Description

Medicine classification method and device, computer equipment and storage medium

Technical Field

The embodiment of the invention relates to the field of medicine classification, in particular to a medicine classification method, a medicine classification device, computer equipment and a storage medium.

Background

Drug classification management is an internationally popular management method. The medicine is divided into prescription medicine and non-prescription medicine according to the safety and effectiveness principle of the medicine and the difference of the variety, specification, indication, dosage, administration route and the like, and corresponding management regulations are made. It has the significance of guaranteeing the medication safety of people.

In the prior art, a medicine classification model mainly starts from a supervised model, and a large amount of labor cost is required to label a sample in the early stage. The manual labeling often has the phenomena of inaccurate labeling and incomplete classification, and therefore, a large amount of manpower is needed to perform maintenance operations such as addition and modification on the classification. This results in time and labor consuming drug classification and also in a low accuracy of classification.

Disclosure of Invention

The embodiment of the invention provides a medicine classification method, a medicine classification device, computer equipment and a storage medium, wherein the medicine classification can be finished without marking.

In order to solve the above technical problem, the embodiment of the present invention adopts a technical solution that: there is provided a drug sorting method comprising:

acquiring target characteristic word vectors representing the illness state and the medicine use of a user according to the case information of the user, wherein the case information is text information, the target characteristic word vectors comprise a first word vector and a second word vector, the first word vector is obtained by extracting the text information through a neural network model, and the second word vector is obtained by filtering stop words of the text information and then carrying out statistics;

inputting the target feature word vectors into a preset medicine classification model, wherein the medicine classification model is an unsupervised training model for clustering by calculating the distance between different feature word vectors;

and performing classification labeling on the used medicines according to the clustering set of the used medicines output by the medicine classification model, wherein the classification labeling is at least one high-frequency word in the clustering set of the used medicines.

Optionally, the target feature word vector includes: the method comprises the following steps of obtaining a target characteristic word vector representing the state of an illness of a user and using a medicine according to case information of the user, wherein the first word vector comprises the following steps:

converting the case information into a behavior vector set;

inputting the behavior vector set into a preset feature extraction model, wherein the feature extraction model is a neural network model which is trained to a convergence state in advance and used for extracting a behavior vector representing a user behavior vector in the behavior vector set;

and reading the user behavior vector output by the feature extraction model, and defining the user behavior vector as a first word vector.

Optionally, the target feature word vector includes: the second word vector, after reading the user behavior vector output by the feature extraction model and defining the user behavior vector as the first word vector, includes:

filtering the case information through a preset stop word list to generate a keyword set;

counting the word frequency of each keyword in the keyword set and the inverse document frequency of each keyword;

calculating the priority value of each keyword according to the word frequency and the inverse document frequency;

and generating the second word vector according to the priority numerical value of each keyword.

Optionally, after the generating the second word vector according to the priority value of each keyword, the method includes:

and combining the first word vector and the second word vector to generate the target characteristic word vector.

Optionally, the inputting the target feature word vector into a preset drug classification model includes:

calculating a first Euclidean distance between the target feature word vector and different feature word vectors;

comparing the first Euclidean distance with a preset first distance threshold;

and when the first Euclidean distance is smaller than the first distance threshold, clustering the target feature vector to a cluster set characterized by the first Euclidean distance to generate a primary cluster set.

Optionally, the clustering the target feature vector into the cluster characterized by the euclidean distance to generate a first-level cluster set when the euclidean distance is greater than the first distance threshold includes:

correcting the parameter value of the effective point distance in the medicine classification model to generate a first parameter value, and calculating a second Euclidean distance between the target feature word vector and different feature word vectors in the primary cluster set;

comparing the second Euclidean distance with a preset second distance threshold, wherein the second distance threshold is smaller than the first distance threshold;

and when the second Euclidean distance is smaller than the second distance threshold value, clustering the target feature vector to a cluster set characterized by the second Euclidean distance to generate a secondary cluster set.

Optionally, when the second euclidean distance is greater than the second distance threshold, after clustering the target feature vector into the cluster set characterized by the second euclidean distance to generate a second-level cluster set, the method includes:

correcting a parameter value of an effective point distance in the medicine classification model to generate a second parameter value, and calculating a third Euclidean distance between the target feature word vector and different feature word vectors in the secondary cluster set, wherein the second parameter value is smaller than the first parameter value;

comparing the third Euclidean distance with a preset third distance threshold, wherein the third distance threshold is smaller than the second distance threshold;

and when the third Euclidean distance is smaller than the third distance threshold, clustering the target feature vector to a cluster set characterized by the third Euclidean distance to generate a third-level cluster set.

In order to solve the above technical problem, an embodiment of the present invention further provides a drug sorting apparatus, including:

the system comprises an acquisition module, a processing module and a processing module, wherein the acquisition module is used for acquiring target characteristic word vectors representing the illness state and the medicine use of a user according to the case information of the user, the case information is text information, the target characteristic word vectors comprise a first word vector and a second word vector, the first word vector is obtained by extracting the text information through a neural network model, and the second word vector is obtained by filtering stop words of the text information and then carrying out statistics;

the processing module is used for inputting the target feature word vectors into a preset medicine classification model, wherein the medicine classification model is an unsupervised training model for clustering by calculating the distance between different feature word vectors;

and the execution module is used for carrying out classification labeling on the used medicines according to the clustering set of the used medicines output by the medicine classification model, wherein the classification labeling is at least one high-frequency word in the clustering set of the used medicines.

Optionally, the target feature word vector includes: a first word vector, the drug sorting apparatus comprising:

the first conversion sub-module is used for converting the case information into a behavior vector set;

the first processing submodule is used for inputting the behavior vector set into a preset feature extraction model, wherein the feature extraction model is a neural network model which is trained to be in a convergence state in advance and used for extracting a behavior vector of a user represented in the behavior vector set;

and the first execution submodule is used for reading the user behavior vector output by the feature extraction model and defining the user behavior vector as a first word vector.

Optionally, the target feature word vector includes: a second word vector, the drug sorting apparatus comprising:

the first filtering submodule is used for filtering the case information through a preset stop word list to generate a keyword set;

the second processing submodule is used for counting the word frequency of each keyword in the keyword set and the inverse document frequency of each keyword;

the first calculation submodule is used for calculating the priority numerical value of each keyword according to the word frequency and the inverse document frequency;

and the second execution submodule is used for generating the second word vector according to the priority numerical value of each keyword.

Optionally, the drug sorting device comprises:

and the first merging submodule is used for merging the first word vector and the second word vector to generate the target characteristic word vector.

Optionally, the drug sorting device comprises:

the first calculation submodule is used for calculating a first Euclidean distance between the target characteristic word vector and different characteristic word vectors;

the first comparison sub-module is used for comparing the first Euclidean distance with a preset first distance threshold;

and the third execution submodule is used for clustering the target feature vector to the clustering set characterized by the first Euclidean distance to generate a primary clustering set when the first Euclidean distance is smaller than the first distance threshold.

Optionally, the drug sorting device comprises:

the second calculation submodule is used for correcting the parameter value of the effective point distance in the medicine classification model to generate a first parameter value, and calculating a second Euclidean distance between the target characteristic word vector and different characteristic word vectors in the primary cluster set;

the second comparison submodule is used for comparing the second Euclidean distance with a preset second distance threshold, wherein the second distance threshold is smaller than the first distance threshold;

and the fourth execution submodule is used for clustering the target feature vector into the clustering set characterized by the second Euclidean distance to generate a secondary clustering set when the second Euclidean distance is smaller than the second distance threshold.

Optionally, the drug sorting device comprises:

the third calculation sub-module is used for correcting the parameter value of the effective point distance in the medicine classification model to generate a second parameter value, and calculating a third Euclidean distance between the target characteristic word vector and different characteristic word vectors in the secondary cluster set, wherein the second parameter value is smaller than the first parameter value;

a third comparison submodule, configured to compare the third euclidean distance with a preset third distance threshold, where the third distance threshold is smaller than the second distance threshold;

and a fifth execution submodule, configured to cluster the target feature vector into a cluster set characterized by the third euclidean distance to generate a third-level cluster set when the third euclidean distance is smaller than the third distance threshold.

In order to solve the above technical problem, an embodiment of the present invention further provides a computer device, including a memory and a processor, where the memory stores computer-readable instructions, and the computer-readable instructions, when executed by the processor, cause the processor to execute the steps of the drug classification method.

In order to solve the above technical problem, embodiments of the present invention further provide a storage medium storing computer-readable instructions, which, when executed by one or more processors, cause the one or more processors to perform the steps of the drug classification method described above.

The embodiment of the invention has the beneficial effects that: when medicine classification is carried out, the name of the medicine and the illness state information of corresponding treatment of the medicine can be obtained by collecting case information of a user, the illness state information corresponding to the name of the medicine is converted into a target characteristic word vector, the target characteristic word vector is input into an unsupervised medicine classification model as input data, medicines which can cure the same or similar illness states are clustered together by the medicine classification model in a clustering mode to form a clustering class, and the clustering class can be a class of medicine classification. Finally, the drug classification is completed by name marking the drugs in the classification category. The classification mode can improve the efficiency of medicine classification, and the corresponding relation between medicines and the illness state can be further strengthened by adopting case information, so that the accuracy of the classification result is improved.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

FIG. 1 is a schematic diagram of a basic process flow of a drug sorting method according to an embodiment of the present invention;

FIG. 2 is a schematic flow chart of collecting a first word vector through a neural network model according to an embodiment of the present invention;

FIG. 3 is a flowchart illustrating a process of extracting word vectors from a keyword set according to an embodiment of the present invention;

FIG. 4 is a schematic flow chart illustrating the generation of a first-level cluster set according to an embodiment of the present invention;

FIG. 5 is a schematic flow chart of generating a secondary cluster set according to an embodiment of the present invention;

FIG. 6 is a schematic flow chart of generating a tertiary cluster set according to an embodiment of the present invention;

FIG. 7 is a schematic diagram of three-level classification according to an embodiment of the present invention;

FIG. 8 is a schematic diagram of a basic structure of a drug sorting device according to an embodiment of the present invention;

FIG. 9 is a block diagram of the basic structure of a computer device according to an embodiment of the present invention.

Detailed Description

In order to make the technical solutions of the present invention better understood, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention.

In some of the flows described in the present specification and claims and in the above figures, a number of operations are included that occur in a particular order, but it should be clearly understood that these operations may be performed out of order or in parallel as they occur herein, with the order of the operations being indicated as 101, 102, etc. merely to distinguish between the various operations, and the order of the operations by themselves does not represent any order of performance. Additionally, the flows may include more or fewer operations, and the operations may be performed sequentially or in parallel. It should be noted that, the descriptions of "first", "second", etc. in this document are used for distinguishing different messages, devices, modules, etc., and do not represent a sequential order, nor limit the types of "first" and "second" to be different.

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

As will be appreciated by those skilled in the art, "terminal" as used herein includes both devices that are wireless signal receivers, devices that have only wireless signal receivers without transmit capability, and devices that include receive and transmit hardware, devices that have receive and transmit hardware capable of performing two-way communication over a two-way communication link. Such a device may include: a cellular or other communication device having a single line display or a multi-line display or a cellular or other communication device without a multi-line display; PCS (Personal Communications Service), which may combine voice, data processing, facsimile and/or data communication capabilities; a PDA (Personal Digital Assistant), which may include a radio frequency receiver, a pager, internet/intranet access, a web browser, a notepad, a calendar and/or a GPS (Global Positioning System) receiver; a conventional laptop and/or palmtop computer or other device having and/or including a radio frequency receiver. As used herein, a "terminal" or "terminal device" may be portable, transportable, installed in a vehicle (aeronautical, maritime, and/or land-based), or situated and/or configured to operate locally and/or in a distributed fashion at any other location(s) on earth and/or in space. As used herein, a "terminal Device" may also be a communication terminal, a web terminal, a music/video playing terminal, such as a PDA, an MID (Mobile Internet Device) and/or a Mobile phone with music/video playing function, or a smart tv, a set-top box, etc.

Referring to fig. 1, fig. 1 is a schematic view of a basic flow chart of the drug classification method according to the present embodiment.

As shown in fig. 1, a method for classifying a pharmaceutical product includes:

s1100, acquiring target characteristic word vectors representing the state of an illness and the use of a medicine of a user according to case information of the user, wherein the case information is text information, the target characteristic word vectors comprise a first word vector and a second word vector, the first word vector is obtained by extracting the text information through a neural network model, and the second word vector is obtained by filtering stop words of the text information and then carrying out statistics;

when the user is hospitalized in a hospital or clinic, the behavior information of the user is recorded in the whole process, and the recorded behavior information comprises: the information of the state of illness of the user, the information of using the medicine and the information of the test result of the user, and the behavior information is defined as the information of the state of illness of the user. The above-mentioned illness state information is text information, but is not limited to this, and according to different application scenarios, in some embodiments, the illness state information further includes: picture information and sound information.

After acquiring the case information, extracting a target characteristic word vector in the case information, wherein the target characteristic word vector is vector information representing the state of an illness of a user and the use of a medicine. The way of extracting the target feature word vector can be used for extracting the feature vector through the neural network model which is trained to be in a convergence state. In some embodiments, the extraction of the target feature word vector can be calculated by counting the word frequency of the keywords in the case information. In some embodiments, the target feature word vector is extracted through a neural network model, then calculated by using a word frequency statistical method, and finally the results obtained by the two calculation methods are combined to obtain the target feature word vector.

S1200, inputting the target feature word vectors into a preset medicine classification model, wherein the medicine classification model is an unsupervised training model for clustering by calculating distances among different feature word vectors;

and inputting the target characteristic word vectors into a preset medicine classification model, wherein the medicine classification model is an unsupervised training model for clustering by calculating the distance between different characteristic word vectors.

In this embodiment, the drug classification model is an unsupervised model, and the unsupervised training model is used to perform clustering on feature word vectors, where the unsupervised training model mainly calculates inter-class distances between different feature word vectors, sets a distance threshold with a measurement property, and clusters feature word vectors with inter-class distances smaller than the distance threshold to generate a cluster set. Clustering a large number of feature word vectors including the target feature word vector to form a plurality of cluster sets, and then each different cluster set is a classification category of the medicine.

The calculation of the inter-class distance is actually to calculate the similarity of the disease condition information treated by different medicines, wherein the smaller the inter-class distance is, the closer the curative effects of the different medicines are, and the larger the inter-class distance is, the larger the difference between the curative effects of the different medicines is. Thus, different taxonomic categories can achieve different cures or effects.

In some embodiments, to further refine the classification analogy of a drug, classification categories are divided into different levels, and after the first level division is completed, further classification is performed in different clustering sets. The adopted method is to reduce the numerical value of the distance threshold value, so that the characteristic word vectors in the clustering set are further distinguished. Meanwhile, the parameter values of the effective point distances in different characteristic word vectors are reduced, so that the intra-class distances of the different characteristic word vectors can be more converged, and the inter-class distances among the characteristic word vectors can be further increased by the convergence of the intra-class distances, so that the differentiation among the different characteristic word vectors in a clustering set can be further increased, and good conditions are provided for further subdividing the categories in the clustering set.

According to the method, as long as the parameter values of the distance threshold and the effective point distance are continuously adjusted, further refined classification can be carried out in the cluster sets of different levels, and the classification category with attribute distribution is formed. In some embodiments, the cluster set is divided into 3 levels, but not limited to, the cluster set can be divided into: 1 stage, 2 stages, 4 stages, 5 stages or more.

S1300, classifying and labeling the used medicines according to the clustering set of the used medicines output by the medicine classification model, wherein the classification label is at least one high-frequency word in the clustering set of the used medicines.

And labeling the medicines in each cluster set and the last cluster set according to the cluster sets output by the medicine classification model. The labeling mode of the cluster set is as follows: and extracting the words with the highest occurrence frequency in the case information of each medicine in the cluster set as the label names of the cluster set, and in some embodiments, when a plurality of label names exist in the same cluster set, sequentially selecting the words according to the sequencing result of the occurrence frequency. And the labeling of the drug name is to directly extract the drug name from the case information for labeling.

In the above embodiment, when drug classification is performed, the name of a drug and the disease condition information corresponding to the drug for treatment can be obtained by collecting case information of a user, the disease condition information corresponding to the name of the drug is converted into a target feature word vector, the target feature word vector is input into an unsupervised drug classification model as input data, the drug classification model clusters drugs capable of curing the same or similar disease conditions together in a clustering manner to form a cluster category, and the cluster category can be one category of drug classification. Finally, the drug classification is completed by name marking the drugs in the classification category. The classification mode can improve the efficiency of medicine classification, and the corresponding relation between medicines and the illness state can be further strengthened by adopting case information, so that the accuracy of the classification result is improved.

In some embodiments, the case information needs to be feature extracted by a neural network model. Referring to fig. 2, fig. 2 is a schematic flow chart illustrating the process of acquiring the first word vector through the neural network model according to the present embodiment.

As shown in fig. 2, S1100 includes:

s1111, converting the case information into a behavior vector set;

and converting the case information into a vector set which can be identified or processed by a neural network model, wherein the case information is converted into the vector set by a word2vec model. But not limited thereto, case information can also be vector-converted by TF-IDF (term frequency-inverse document frequency) technology in some embodiments, depending on the specific application scenario.

S1112, inputting the behavior vector set into a preset feature extraction model, wherein the feature extraction model is a neural network model which is trained to a convergence state in advance and used for extracting a behavior vector representing a user behavior vector in the behavior vector set;

and inputting the converted vector set into a preset feature extraction model, wherein the feature extraction model is a neural network model which is trained to a convergence state in advance and used for extracting the behavior vectors representing the user in the behavior vector set.

In this embodiment, the feature extraction model is used to extract word vectors associated with the disease information and the drug information of the user in the vector set.

In order to enable the feature extraction model to accurately extract the word vectors associated with the disease condition information and the medicine information of the user, the feature extraction model needs to be trained. The training mode is as follows: collecting a training sample set, wherein the training sample set consists of a plurality of vector sets after case information conversion, manually calibrating word vectors in each vector set, then sequentially inputting the labeled vector sets into a neural network model, calculating the distance between an excitation word vector and a labeled word vector after the excitation word vector is extracted by the neural network model, and calibrating the weight of the neural network model through a back propagation algorithm if the distance is greater than a set distance threshold. After the calibration is finished, the steps are repeated until the distance between the excitation word vector and the annotation word vector is smaller than a set distance threshold, the vector set is trained, the vector sets in the training sample set are trained by adopting the method until the training is finished when the accuracy of the neural network model for extracting the word vector is larger than a set value (for example, 98%), and the trained neural network model is a feature extraction model.

The feature extraction model trained to the convergence state can accurately extract word vectors associated with the disease condition information and the medicine information of the user in the vector set, and the word vectors are the user behavior vectors.

S1113, reading the user behavior vector output by the feature extraction model, and defining the user behavior vector as a first word vector.

And reading the user behavior vector output by the feature extraction model, wherein the user behavior vector represents a word vector associated with the disease condition information and the medicine information of the user, and therefore, the extracted user behavior vector can be used as input data of the medicine classification model. In this embodiment, the user behavior vector is defined as a first word vector.

The word vectors recorded with the key information can be rapidly extracted through the neural network model trained to be convergent, so that the data processing procedure of the medicine classification model is simplified, and the processing efficiency of the medicine classification model is improved.

In some embodiments, in order to further collect the medical condition information and the drug information of the user recorded in the case information and reduce the leakage rate of the key information, further extraction by a key extraction method is required. Referring to fig. 3, fig. 3 is a schematic diagram illustrating a process of extracting word vectors from a keyword set according to the present embodiment.

As shown in fig. 3, after S1113, the method includes:

s1121, filtering the case information through a preset stop word list to generate a keyword set;

in this embodiment, in order to further filter information irrelevant to the condition information and the medicine information of the user from the case information, the case information needs to be filtered by using the stop word, which is the filtered word.

The stop word list is established and recorded with stop words obtained through statistics, for example, stop words with parts of speech such as verbs, adverbs and adjectives are set as stop words, the stop words with the parts of speech in the case information are removed after the stop words are screened through the stop word list, the case information after the stop words is removed to generate a keyword set, and keywords of the disease information and the medicine information of the user are recorded in the keyword set.

S1122, counting the word frequency of each keyword in the keyword set and the inverse document frequency of each keyword;

after a keyword set is obtained through filtering, calculating the word frequency of each keyword in the keyword set, wherein the calculation mode of the word frequency is as follows:

after the word frequency of each keyword is obtained through calculation, the inverse document frequency of each keyword is calculated and used for determining the importance of each keyword, the magnitude of the inverse document frequency is generally inversely proportional to the common degree of a word, and the calculation mode of the inverse document frequency is as follows:

s1123, calculating the priority numerical value of each keyword according to the word frequency and the inverse document frequency;

after the word frequency and the inverse document frequency of each keyword are obtained through calculation, the word frequency and the inverse document frequency are multiplied to obtain a priority value of each keyword, priority power reduction ordering is performed on each keyword according to each priority value, and how many keywords are located at the top are selected as keywords to be converted according to actual needs, for example, the top 20 keywords are extracted as keywords with conversion, but the determination of the number of the keywords with conversion is not limited to this, and according to different specific application scenarios, in some embodiments, the number of the keywords with conversion can be any number.

S1124, generating the second word vector according to the priority value of each keyword.

And performing priority power-down sequencing on each keyword according to each priority value, and selecting the front keywords as the keywords to be converted according to actual needs. And converting the keywords with conversion screened by the priority numerical value into a second word vector through a word2vec model or a TF-IDF technology.

In some embodiments, the first word vector is extracted from the neural network model, and since the association relationship between the word vector extracted from the neural network model and the text information substantially bears the subjective will of people and is obtained through repeated directional training learning, the neural network model has a defect that convergence is difficult when multiple association relationships are cross-trained, and therefore the extracted first word vector has a problem that the comprehensiveness of the extracted word vector is not enough or keyword vectors are omitted. The second word vector is obtained by statistics after the stop words are filtered, and the second word vector does not carry any personal will when statistics is carried out, can most directly reflect the distribution condition of each keyword, and can be more comprehensively extracted without emphasis. The first word vector and the second word vector are combined, and the target characteristic word vector is generated after combination, so that the data is more comprehensive, the characteristic word vector concerned by people can be highlighted, objective characteristic word vectors can be considered comprehensively, the extracted data is comprehensive and key, and the comprehensive and key data is favorable for improving the accuracy of a medicine classification model. Please refer to step S1131.

S1131, merging the first word vector and the second word vector to generate the target feature word vector.

The way of merging the first word vector and the second word vector is as follows: and performing addition operation on a word vector matrix formed by the first word vector and a word vector matrix formed by the second word vector, wherein the obtained result is a vector matrix of the target characteristic word vector, and the vector matrix is input data of the medicine classification model.

In some embodiments, the drug classification model generates a primary cluster set, and the cluster set of target feature word vectors needs to be determined by calculating euclidean distances between the target feature word vectors and different feature word vectors. Referring to fig. 4, fig. 4 is a schematic flow chart illustrating the generation of the primary cluster set according to the present embodiment.

As shown in fig. 4, S1200 includes:

s1211, calculating a first Euclidean distance between the target feature word vector and different feature word vectors;

when the drug classification model classifies the target feature word vector, the distance between the target feature word vector and other feature word vectors needs to be calculated, specifically, the euclidean distance between the target feature word vector and different feature word vectors is calculated, and the euclidean distance is collectively referred to as a first euclidean distance. But not limited thereto, in some embodiments, the calculation is performed by calculating a mahalanobis distance or a cosine distance between the target feature word vector and a different feature word vector.

S1212, comparing the first Euclidean distance with a preset first distance threshold;

and comparing a first Euclidean distance between the target characteristic word vector and different characteristic word vectors with a set first distance threshold. The first distance threshold is a threshold for measuring whether the feature word vectors meet the first screening condition, for example, the value of the first distance threshold is 0.5.

Comparing the first Euclidean distance with a preset first distance threshold value can judge which feature word vector or feature word vectors of which kind should be clustered with the target feature word vector.

S1213, when the first Euclidean distance is smaller than the first distance threshold, clustering the target feature vector to a cluster set characterized by the first Euclidean distance to generate a primary cluster set.

And when the first Euclidean distance between the target characteristic word vector and a certain characteristic word vector is smaller than a first distance threshold value through comparison and judgment, the target characteristic word vector is proved to be clustered to a clustering set in which the characteristic word vector is located. After the target characteristic word vectors of all case information are clustered, a primary cluster set is generated, and the primary cluster set is composed of at least one cluster set.

In some embodiments, the drug classification model generates a set of secondary clusters that require further refined clustering based on the set of primary clusters. Referring to fig. 5, fig. 5 is a schematic flow chart of generating a secondary cluster set according to the present embodiment.

As shown in fig. 5, after S1213, the method includes:

s1221, correcting parameter values of effective point distances in the medicine classification model to generate first parameter values, and calculating second Euclidean distances between the target feature word vectors and different feature word vectors in the primary cluster set;

before the medicine classification model is subjected to secondary clustering, the parameters of the effective point intervals in the medicine classification model need to be adjusted, the effective point intervals refer to the class-to-class distances which are not ignored in each characteristic word vector, and due to the efficiency of data calculation, the class-to-class distances need to be screened before the class-to-class distance calculation, the screening mode is to set the parameter values of the effective point intervals, and the class-to-class distances smaller than the parameter values of the effective point intervals in the class-to-class distances are judged to be invalid, so that the numerical values of the parameter values of the effective point intervals are reduced, the diversity of the class-to-class distances is increased, more detailed parts of each characteristic word vector are exposed, the difference among different characteristic word vectors in the same cluster is increased, and the secondary clustering is facilitated. The corrected parameter value of the effective point interval is a first parameter value. The value of the first parameter value is smaller than the value of the effective point distance set by the medicine classification model before a cluster set.

After the first parameter value is set, the medicine classification model carries out secondary clustering in each clustering set in the primary clustering sets. The secondary clustering mode is as follows: and calculating a second Euclidean distance between the target characteristic word vector and other characteristic word vectors in the clustering set where the target characteristic word vector is located. But not limited thereto, in some embodiments, the second euclidean distance calculation can be modified to calculate a mahalanobis or cosine distance between the target feature word vector and a different feature word vector.

S1222, comparing the second euclidean distance with a preset second distance threshold, where the second distance threshold is smaller than the first distance threshold;

and comparing a second Euclidean distance between the target characteristic word vector and different characteristic word vectors with a set second distance threshold. The second distance threshold is a threshold for measuring whether the feature word vectors meet the second screening condition, for example, the value of the second distance threshold is 0.1.

And comparing the second Euclidean distance with a preset second distance threshold value to judge which feature word vector or feature word vectors should be clustered with the target feature word vector in the clustering set where the target feature word vector is located.

And S1223, when the second Euclidean distance is smaller than the second distance threshold, clustering the target feature vector to a cluster set characterized by the second Euclidean distance to generate a secondary cluster set.

And comparing and judging that the target characteristic word vector should be clustered to the clustering set where the characteristic word vector is located when a second Euclidean distance between the target characteristic word vector and a certain characteristic word vector is smaller than a second distance threshold. After the feature word vectors in all the clustering sets are clustered, a secondary clustering set is generated and consists of at least one clustering set.

In some embodiments, the drug classification model generates a tertiary cluster set that requires further refined clustering based on the secondary cluster set. Referring to fig. 6, fig. 6 is a schematic flow chart illustrating the generation of the tertiary cluster set according to the present embodiment.

As shown in fig. 6, after S1231, the method includes:

s1231, correcting parameter values of effective point intervals in the medicine classification model to generate second parameter values, and calculating third Euclidean distances between the target feature word vectors and different feature word vectors in the secondary cluster set, wherein the second parameter values are smaller than the first parameter values;

before the medicine classification model is subjected to tertiary clustering, the parameters of the effective point intervals in the medicine classification model need to be adjusted, the effective point intervals refer to the class-to-class distances which are not ignored in each characteristic word vector, and because of the efficiency of data calculation, the class-to-class distances need to be screened before the class-to-class distance calculation, the screening mode is to set the parameter values of the effective point intervals, and the class-to-class distances smaller than the parameter values of the effective point intervals in the class-to-class distances are judged to be invalid, so that the numerical values of the parameter values of the effective point intervals are reduced, the diversity of the class-to-class distances is increased, more detailed parts of each characteristic word vector are exposed, the difference among different characteristic word vectors in the same clustering is increased, and the tertiary clustering is facilitated. The corrected parameter value of the effective point interval is the second parameter value. The value of the second parameter value is less than the first parameter value.

After the second parameter value is set, the medicine classification model carries out three-level clustering in each clustering set in the second-level clustering sets. The way of tertiary clustering is as follows: and calculating a third Euclidean distance between the target characteristic word vector and other characteristic word vectors in the clustering set where the target characteristic word vector is located. But not limited thereto, in some embodiments, the third euclidean distance calculation can be modified to calculate a mahalanobis or cosine distance between the target feature word vector and a different feature word vector.

S1232, comparing the third Euclidean distance with a preset third distance threshold, wherein the third distance threshold is smaller than the second distance threshold;

and comparing a third Euclidean distance between the target characteristic word vector and different characteristic word vectors with a set third distance threshold. The third distance threshold is a threshold for measuring whether the feature word vectors meet the third filtering condition, for example, the value of the third distance threshold is 0.05.

And comparing the third Euclidean distance with a preset third distance threshold value to judge which feature word vector or feature word vectors should be clustered with the target feature word vector in the clustering set where the target feature word vector is located.

And S1233, clustering the target feature vectors to the clustering set characterized by the third Euclidean distance to generate a tertiary clustering set when the third Euclidean distance is smaller than the third distance threshold.

And comparing and judging that the target characteristic word vector should be clustered to a clustering set in which the characteristic word vector is located when the third Euclidean distance between the target characteristic word vector and a certain characteristic word vector is smaller than a third distance threshold. After the feature word vectors in all the clustering sets are clustered, a three-level clustering set is generated and consists of at least one clustering set. To this end, the three-level classification of the medicine is completed, but the setting of the classification level is not limited thereto, and in some embodiments, the parameter value of the effective dot pitch and the distance threshold are further corrected, so that the refined classification can be further performed.

Referring to fig. 7, fig. 7 is a schematic diagram of three-level classification according to the present embodiment.

As shown in fig. 7, the medicine is classified into three levels, which are: a primary cluster set 11, a secondary cluster set 12, and a tertiary cluster set 13. The clusters at three different levels are arranged in a tree.

In order to solve the above technical problem, an embodiment of the present invention further provides a drug sorting apparatus.

Referring to fig. 8, fig. 8 is a schematic view of a basic structure of the drug sorting device according to the present embodiment.

As shown in fig. 8, a medicine sorting device includes: an acquisition module 2100, a processing module 2200, and an execution module 2300. The obtaining module 2100 is configured to obtain a target feature word vector representing a disease condition of a user and a medicine to be used according to case information of the user, where the case information is text information; the processing module 2200 is configured to input the target feature word vectors into a preset drug classification model, where the drug classification model is an unsupervised training model that performs clustering by calculating distances between different feature word vectors; the executing module 2300 is configured to label the classification information of the used drugs according to the cluster set of the used drugs output by the drug classification model.

When the medicine classification device classifies medicines, the medicine names and the illness state information of corresponding treatment of the medicines can be obtained by collecting case information of a user, the illness state information corresponding to the medicine names is converted into target characteristic word vectors, the target characteristic word vectors are input into an unsupervised medicine classification model as input data, the medicines which can cure the same or similar illness states are clustered together by the medicine classification model in a clustering mode to form a clustering class, and the clustering class can be a class of medicine classification. Finally, the drug classification is completed by name marking the drugs in the classification category. The classification mode can improve the efficiency of medicine classification, and the corresponding relation between medicines and the illness state can be further strengthened by adopting case information, so that the accuracy of the classification result is improved.

In some embodiments, the target feature word vector comprises: the first word vector, the medicine classification device includes: the system comprises a first conversion submodule, a first processing submodule and a first execution submodule. The first conversion sub-module is used for converting the case information into a behavior vector set; the first processing submodule is used for inputting the behavior vector set into a preset feature extraction model, wherein the feature extraction model is a neural network model which is trained to be in a convergence state in advance and used for extracting a behavior vector of a representative user in the behavior vector set; the first execution submodule is used for reading the user behavior vector output by the feature extraction model and defining the user behavior vector as a first word vector.

In some embodiments, the target feature word vector comprises: the second word vector, the drug sorting device comprises: the device comprises a first filtering submodule, a second processing submodule, a first calculating submodule and a second executing submodule. The first filtering submodule is used for filtering case information through a preset stop word list to generate a keyword set; the second processing submodule is used for counting the word frequency of each keyword in the keyword set and the inverse document frequency of each keyword; the first calculation submodule is used for calculating the priority value of each keyword through the word frequency and the inverse document frequency; and the second execution submodule is used for generating a second word vector according to the priority numerical value of each keyword.

In some embodiments, a drug sorting device comprises: and the first merging submodule is used for merging the first word vector and the second word vector to generate a target characteristic word vector.

In some embodiments, a drug sorting device comprises: the device comprises a first calculation submodule, a first comparison submodule and a third execution submodule. The first calculation submodule is used for calculating a first Euclidean distance between a target characteristic word vector and different characteristic word vectors; the first comparison sub-module is used for comparing the first Euclidean distance with a preset first distance threshold; and the third execution submodule is used for clustering the target feature vector to the clustering set characterized by the first Euclidean distance to generate a primary clustering set when the first Euclidean distance is smaller than a first distance threshold value.

In some embodiments, a drug sorting device comprises: a second calculation submodule, a second comparison submodule and a fourth execution submodule. The second calculation submodule is used for correcting the parameter value of the effective point distance in the medicine classification model to generate a first parameter value, and calculating a second Euclidean distance between a target characteristic word vector and different characteristic word vectors in the primary cluster set; the second comparison submodule is used for comparing the second Euclidean distance with a preset second distance threshold, wherein the second distance threshold is smaller than the first distance threshold; and the fourth execution submodule is used for clustering the target feature vector to the clustering set characterized by the second Euclidean distance to generate a secondary clustering set when the second Euclidean distance is smaller than the second distance threshold.

In some embodiments, a drug sorting device comprises: a third calculation submodule, a third comparison submodule and a fifth execution submodule. The third calculation sub-module is used for correcting the parameter value of the effective point distance in the medicine classification model to generate a second parameter value, and calculating a third Euclidean distance between a target characteristic word vector and different characteristic word vectors in the secondary cluster set, wherein the second parameter value is smaller than the first parameter value; the third comparison submodule is used for comparing the third Euclidean distance with a preset third distance threshold, wherein the third distance threshold is smaller than the second distance threshold; and the fifth execution submodule is used for clustering the target feature vector to the clustering set characterized by the third Euclidean distance to generate a tertiary clustering set when the third Euclidean distance is smaller than a third distance threshold.

In order to solve the above technical problem, an embodiment of the present invention further provides a computer device. Referring to fig. 9, fig. 9 is a block diagram of a basic structure of a computer device according to the present embodiment.

As shown in fig. 9, the internal structure of the computer device is schematically illustrated. The computer device includes a processor, a non-volatile storage medium, a memory, and a network interface connected by a system bus. Wherein the non-volatile storage medium of the computer device stores an operating system, a database, and computer readable instructions, the database may store control information sequences, and the computer readable instructions, when executed by the processor, may cause the processor to implement a method of drug classification. The processor of the computer device is used for providing calculation and control capability and supporting the operation of the whole computer device. The memory of the computer device may have stored therein computer readable instructions that, when executed by the processor, may cause the processor to perform a method of drug classification. The network interface of the computer device is used for connecting and communicating with the terminal. Those skilled in the art will appreciate that the architecture shown in fig. 9 is merely a block diagram of some of the structures associated with the disclosed aspects and is not intended to limit the computing devices to which the disclosed aspects apply, as particular computing devices may include more or less components than those shown, or may combine certain components, or have a different arrangement of components.

In this embodiment, the processor is configured to execute specific functions of the obtaining module 2100, the processing module 2200, and the executing module 2300 in fig. 8, and the memory stores program codes and various data required for executing the modules. The network interface is used for data transmission to and from a user terminal or a server. The memory in this embodiment stores program codes and data necessary for executing all the sub-modules in the medicine sorting device, and the server can call the program codes and data of the server to execute the functions of all the sub-modules.

When medicine classification is carried out by computer equipment, the name of the medicine and the illness state information of corresponding treatment of the medicine can be obtained by collecting case information of a user, the illness state information corresponding to the medicine name is converted into a target characteristic word vector, the target characteristic word vector is input into an unsupervised medicine classification model as input data, medicines which can cure the same or similar illness states are clustered together by the medicine classification model in a clustering mode to form a clustering class, and the clustering class can be a class of medicine classification. Finally, the drug classification is completed by name marking the drugs in the classification category. The classification mode can improve the efficiency of medicine classification, and the corresponding relation between medicines and the illness state can be further strengthened by adopting case information, so that the accuracy of the classification result is improved.

The present invention also provides a storage medium storing computer-readable instructions which, when executed by one or more processors, cause the one or more processors to perform the steps of any of the above-described method for drug classification.

It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by a computer program, which can be stored in a computer-readable storage medium, and can include the processes of the embodiments of the methods described above when the computer program is executed. The storage medium may be a non-volatile storage medium such as a magnetic disk, an optical disk, a Read-Only Memory (ROM), or a Random Access Memory (RAM).

It should be understood that, although the steps in the flowcharts of the figures are shown in order as indicated by the arrows, the steps are not necessarily performed in order as indicated by the arrows. The steps are not performed in the exact order shown and may be performed in other orders unless explicitly stated herein. Moreover, at least a portion of the steps in the flow chart of the figure may include multiple sub-steps or multiple stages, which are not necessarily performed at the same time, but may be performed at different times, which are not necessarily performed in sequence, but may be performed alternately or alternately with other steps or at least a portion of the sub-steps or stages of other steps.

Claims

1. A method of classifying a pharmaceutical product, comprising:

and performing classification labeling on the used medicines according to the clustering set of the used medicines output by the medicine classification model, wherein the classification labeling content is at least one high-frequency word in the clustering set of the used medicines.

2. The drug classification method according to claim 1, wherein the obtaining of the target feature word vector characterizing the condition of a patient and the use of a drug according to the case information of the user comprises:

converting the case information into a behavior vector set;

3. The method for classifying drugs according to claim 2, wherein after reading the user behavior vector output by the feature extraction model and defining the user behavior vector as a first word vector, the method comprises:

4. The method for classifying drugs according to claim 3, wherein the step of generating the second word vector according to the priority value of each keyword comprises:

5. The method for classifying drugs according to claim 1, wherein the inputting the target feature word vector into a preset drug classification model comprises:

comparing the first Euclidean distance with a preset first distance threshold;

6. The method for classifying drugs according to claim 5, wherein the clustering the target feature vector into the Euclidean distance-characterized cluster set to generate a primary cluster set when the Euclidean distance is greater than the first distance threshold comprises:

7. The method for classifying a drug according to claim 6, wherein the clustering the target feature vector into the cluster characterized by the second Euclidean distance when the second Euclidean distance is greater than the second distance threshold value, after generating a second-level cluster set, comprises:

8. A drug sorting device, comprising:

9. A computer device comprising a memory and a processor, the memory having stored therein computer readable instructions which, when executed by the processor, cause the processor to carry out the steps of the drug sorting method according to any one of claims 1 to 7.

10. A storage medium having stored thereon computer-readable instructions which, when executed by one or more processors, cause the one or more processors to perform the steps of the drug sorting method of any one of claims 1 to 7.