CN112035664A - Medicine classification method and device and computer equipment - Google Patents

Medicine classification method and device and computer equipment Download PDF

Info

Publication number
CN112035664A
CN112035664A CN202010888451.9A CN202010888451A CN112035664A CN 112035664 A CN112035664 A CN 112035664A CN 202010888451 A CN202010888451 A CN 202010888451A CN 112035664 A CN112035664 A CN 112035664A
Authority
CN
China
Prior art keywords
medicine
information
message queue
data information
drug
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010888451.9A
Other languages
Chinese (zh)
Inventor
操文彬
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Ping An Medical Health Technology Service Co Ltd
Original Assignee
Ping An Medical and Healthcare Management Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ping An Medical and Healthcare Management Co Ltd filed Critical Ping An Medical and Healthcare Management Co Ltd
Priority to CN202010888451.9A priority Critical patent/CN112035664A/en
Publication of CN112035664A publication Critical patent/CN112035664A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • G06F16/353Clustering; Classification into predefined classes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Databases & Information Systems (AREA)
  • Medical Treatment And Welfare Office Work (AREA)

Abstract

The invention provides a method, a device and computer equipment for classifying medicines, wherein the method comprises the following steps: acquiring data information of each medicine in a database of a hospital; pre-classifying the medicines according to the data information of the medicines; sequentially sending the data information of the medicines to corresponding standardized processing models for standardized processing to obtain medicine information of each medicine after standardized processing; and classifying each medicine in the standard database according to the medicine information after the standardization processing. The invention has the beneficial effects that: by the medicine classifying method, the early preparation work of self-correction and self-check can be completed, the medicine information of each medicine is standardized, a large amount of labor investment for recoding and comparing each medicine in a hospital database in the self-correction and self-check work is reduced, the time consumption of code matching work is reduced, and errors caused by manual code matching are reduced.

Description

Medicine classification method and device and computer equipment
Technical Field
The invention relates to the field of medical science and technology, in particular to a medicine classification method and device and computer equipment.
Background
In the working process of self-correction and self-check, the three-catalog coding of the medicine is disordered and is difficult to manage and maintain due to the fact that the informatization degrees of medical insurance offices or hospitals are different. Due to the short cycle of self-checking and self-correcting work, heavy tasks and large workload, if too much time is consumed in the manual code matching work of the medicines, a large amount of time is consumed. Therefore, a method for classifying drugs is needed to realize automatic code matching of drugs.
Disclosure of Invention
The invention mainly aims to provide a medicine classification method, a medicine classification device and computer equipment, and aims to solve the problem that a large amount of time is consumed for manually code matching of medicines.
The invention provides a method for classifying drugs, which comprises the following steps:
acquiring data information of each medicine in a database of a hospital;
pre-classifying the medicines according to the data information of the medicines;
according to the pre-classification result, the data information of the medicine is programmed into a message queue of a corresponding class;
according to the type of the message queue, sequentially sending the data information of the medicines on the message queue to a corresponding standardized processing model for standardized processing to obtain the medicine information of each medicine after standardized processing; the standardized processing model is trained on sample data consisting of data information of each medicine collected in advance in each hospital and standardized medicine information associated with the medicine data information;
and classifying each medicine in the standard database according to the medicine information after the standardization processing.
Further, the step of sequentially sending the data information of the medicines on the message queue to a corresponding standardized processing model for standardized processing according to the type of the message queue to obtain the medicine information of each medicine after standardized processing includes:
obtaining the attribute corresponding to each medicine according to the data information of each medicine carried in the message queue, vectorizing the attribute to obtain the multi-dimensional coordinate X ═ X (X) of the attribute feature vector1,x2,…,xi,…,xn);
According to the formula
Figure BDA0002656230730000021
Calculating the matching degree between the attribute feature vector and each pre-stored vector; wherein, Y is the multidimensional coordinate of each pre-stored vector in the pre-stored database, and Y is (Y)1,y2,…,yi,…,yn),xiRepresents the coordinate, y, corresponding to the ith attribute in the attribute feature vectoriRepresents the coordinate, s, corresponding to the ith attribute in the pre-stored vectoriThe coefficient is corresponding to the ith attribute;
and obtaining target pre-stored vectors corresponding to the medicines according to the matching degree of each pre-stored vector, and endowing the medicine information corresponding to the target pre-stored vectors to the corresponding medicines.
Further, the step of sequentially sending the data information of the medicines on the message queue to a corresponding standardized processing model for standardized processing according to the type of the message queue to obtain the medicine information of each medicine after standardized processing includes:
acquiring a first medicine number in the hospital database and acquiring a second medicine number contained in each message queue;
calculating the proportion value of the number of the second medicines in the number of the first medicines;
judging whether each proportional value has a target proportional value smaller than a preset proportional value;
if so, acquiring processing parameter data of the target standardized processing model after processing the data information of the medicines on the corresponding message queue by using the standardized processing model corresponding to the target proportion value; the target standardization processing model is one of other standardization processing models except the standardization processing model corresponding to the target proportion value;
and receiving and processing the data information of the unprocessed medicines in the message queue corresponding to the target standardization processing model according to the acquired processing parameter data of the target standardization processing model.
Further, the step of pre-classifying the drugs according to the data information of the drugs includes:
acquiring the category information of the medicine, and vectorizing the category information to obtain a first vector corresponding to the medicine;
according to the formula
Figure BDA0002656230730000031
Calculating the similarity between the first vector and a second vector corresponding to each message queue; wherein, the
Figure BDA0002656230730000032
Represents a first vector, said
Figure BDA0002656230730000033
Representing a second vector;
comparing the similarity with a preset similarity threshold value of each message queue according to the similarity;
and according to the comparison result, the medicine and the data information of the medicine are programmed into the corresponding message queue.
Further, the step of obtaining the category information of the drug and vectorizing the category information to obtain a first vector corresponding to the drug includes:
preprocessing the data information of the medicine; the preprocessing comprises the steps of eliminating punctuation marks, unifying languages and deleting irrelevant words and sentences in the problem according to the special character identification library, wherein the irrelevant words and sentences comprise greetings and adjectives;
reading text data of a data set according to a BERT Chinese training model, and segmenting the text data in a fine-tuning mode;
recognizing the text data after word segmentation by a semantic recognition technology, extracting category keywords in the text data, and acquiring the category keywords as the category information.
Further, the step of classifying each drug in the standard database according to the drug information after the standardization process includes:
establishing a TOKEN list and endowing each medicine with a TOKEN label;
acquiring the drug information of each drug after standardization processing, and attaching the drug information after standardization processing to the corresponding TOKEN label to form a drug label;
and inputting the medicine label into a standard database, and classifying according to the medicine label.
Further, before the step of sequentially sending the data information of the drugs on the message queue to the corresponding standardized processing model for standardized processing according to the type of the message queue to obtain the drug information of each drug after standardized processing, the method further includes:
acquiring data information of each medicine of each hospital and sample data of standardized medicine information associated with each medicine;
according to the category information of the message queue, dividing sample data corresponding to each medicine into a plurality of sample data of different categories;
and training the corresponding standardized processing model based on the sample data of different classes.
The invention also provides a medicine classifying device, which comprises:
the data information acquisition module is used for acquiring data information of each medicine in a database of the hospital;
the medicine pre-classification module is used for pre-classifying the medicines according to the data information of the medicines;
the medicine compiling module is used for compiling the data information of the medicines into the message queues of the corresponding categories according to the pre-classification result;
the standardization processing module is used for sequentially sending the data information of the medicines on the message queue to a corresponding standardization processing model for standardization processing according to the type of the message queue to obtain the medicine information of each medicine after standardization processing; the standardized processing model is trained on sample data consisting of data information of each medicine collected in advance in each hospital and standardized medicine information associated with the medicine data information;
and the classification module is used for classifying and processing each medicine in the standard database according to the medicine information after the standardization processing.
The invention also provides a computer device comprising a memory storing a computer program and a processor implementing the steps of any of the above methods when the processor executes the computer program.
The invention also provides a computer-readable storage medium, on which a computer program is stored, which computer program, when being executed by a processor, carries out the steps of the method of any of the above.
The invention has the beneficial effects that: by the medicine classifying method, the early preparation work of self-correction and self-check can be completed, the medicine information of each medicine is standardized, a large amount of labor investment for recoding and comparing each medicine in a hospital database in the self-correction and self-check work is reduced, the time consumption of code matching work is reduced, and errors caused by manual code matching are reduced.
Drawings
FIG. 1 is a flow chart illustrating a method for classifying a drug according to an embodiment of the present invention;
FIG. 2 is a block diagram schematically illustrating the structure of a medicine sorting apparatus according to an embodiment of the present invention;
fig. 3 is a block diagram illustrating a structure of a computer device according to an embodiment of the present application.
The implementation, functional features and advantages of the objects of the present invention will be further explained with reference to the accompanying drawings.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
It should be noted that all directional indicators (such as up, down, left, right, front, back, etc.) in the embodiments of the present invention are only used to explain the relative position relationship between the components, the motion situation, etc. in a specific posture (as shown in the drawings), and if the specific posture is changed, the directional indicator is changed accordingly, and the connection may be a direct connection or an indirect connection.
The term "and/or" herein is merely an association describing an associated object, meaning that three relationships may exist, e.g., a and B, may mean: a exists alone, A and B exist simultaneously, and B exists alone.
In addition, the descriptions related to "first", "second", etc. in the present invention are only for descriptive purposes and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include at least one such feature. In addition, technical solutions between various embodiments may be combined with each other, but must be realized by a person skilled in the art, and when the technical solutions are contradictory or cannot be realized, such a combination should not be considered to exist, and is not within the protection scope of the present invention.
Referring to fig. 1, the present invention provides a method for classifying drugs, including:
s1: acquiring data information of each medicine in a database of a hospital;
s2: pre-classifying the medicines according to the data information of the medicines;
s3: according to the pre-classification result, the data information of the medicine is programmed into a message queue of a corresponding class;
s4: according to the type of the message queue, sequentially sending the data information of the medicines on the message queue to a corresponding standardized processing model for standardized processing to obtain the medicine information of each medicine after standardized processing; the standardized processing model is trained on sample data consisting of data information of each medicine collected in advance in each hospital and standardized medicine information associated with the medicine data information;
s5: and classifying each medicine in the standard database according to the medicine information after the standardization processing.
As described in step S1, the data information of each drug in the database of the hospital is obtained, and since no fixed standard is set, each hospital may have different data recording systems, and each recording system has different drug data information, and the different drug data information is further processed, and the obtaining manner may be to directly extract the data information of all the drugs contained in the database of the hospital. The acquired medicine data information is input into a cache database, for example, a REDIS database, the input mode can be input through KAFKA, and the KAFKA is a distributed message publishing and subscribing system, and has high performance and high throughput rate. Therefore, the data information of each medicine in the huge database of the hospital can be quickly input.
As described in step S2, each medicine may be pre-classified according to the data information of the medicine, such as the category information included in the data information, or the usage information; the pre-classification can be divided into several categories, for example, prescription drugs, non-prescription drugs, controlled drugs, and the like, and also can be traditional Chinese medicines, western medicines, and the like.
As described in step S3, the corresponding medicine and the data information of the medicine may be programmed into the corresponding category message queue according to the pre-classification result, where the different category message queues are used to be sent to the corresponding programs for processing, and the place where the programming is implemented may still be implemented in the REDIS database.
As described in the above step S4, the data information of the drugs on the message queue is sequentially sent to the corresponding standardized processing models according to the type of the message queue, and the sending method may also be sending through KAFKA, and the data information of the drugs on the message queue is sequentially sent to the corresponding standardized processing models according to the type of the message queue, where it should be noted that the corresponding standardized processing models are also trained according to the sample data of the corresponding types, so that the corresponding standardized processing models ensure the same processing effect, simplify the complexity of the models, reduce the size of the models, and increase the computation efficiency.
As described in the step S5, the medicines are classified according to the obtained information of the medicines after the standardization process, so as to complete the standardization process of the medicines, and a standard database is established during the subsequent self-checking and self-correcting process, so that the information of each medicine has a specific standard, which is convenient for the subsequent checking and correcting process and also for the uniform recording of the information of the medicines.
In an embodiment, the step S4, of sequentially sending the data information of the drugs in the message queue to the corresponding standardization processing model for standardization processing according to the type of the message queue to obtain the drug information of each drug after standardization processing, includes:
s401: according to the data information of each medicine carried in the message queue, acquiring the corresponding attribute of each medicine and vectorizing to obtain the multidimensional coordinate X of the attribute feature vector (X ═ X)1,x2,…,xi,…,xn);
S402: according to the formula
Figure BDA0002656230730000071
Calculating the matching degree between the attribute feature vector and each pre-stored vector; wherein, Y is the multidimensional coordinate of each pre-stored vector in the pre-stored database, and Y is (Y)1,y2,…,yi,…,yn),xiRepresenting the second in the attribute feature vectorCoordinates, y, corresponding to i attributesiRepresents the coordinate, s, corresponding to the ith attribute in the pre-stored vectoriThe coefficient is corresponding to the ith attribute;
s403: and obtaining target pre-stored vectors corresponding to the medicines according to the matching degree of each pre-stored vector, and endowing the medicine information corresponding to the target pre-stored vectors to the corresponding medicines.
As described in the above steps S401 to S403, data information corresponding to each drug carried in each message queue is extracted to obtain an attribute corresponding to each drug, and the corresponding attribute is vectorized, where the attribute may be an attribute such as a name, a category, and a use of the drug, and then multidimensional coordinates of an attribute feature vector related to the drug are obtained. Calculating the matching degree with each pre-stored vector according to a formula, wherein y in each pre-stored vector is required to be accountediAnd xiCorresponding to the same attribute, the weight ratio of each attribute should be different, so the parameter s is introducediWherein the parameter siIs obtained by implementing training, and the parameter s is different according to iiIs different, it is to be explained that i is different from the parameter siThere is no functional correspondence between them, parameter siThe correlation with the ith property, e.g. the shape of the drug, is weaker and its parameter siThe corresponding is smaller, while the dependency of this property on the application is stronger, its parameter siAnd correspondingly larger. And then, target pre-stored vectors corresponding to the medicines are pre-stored according to the matching degree of each pre-stored vector, and then medicine information corresponding to the target pre-stored vectors is endowed to the corresponding medicines.
In an embodiment, the step S4, of sequentially sending the data information of the drugs in the message queue to the corresponding standardization processing model for standardization processing according to the type of the message queue to obtain the drug information of each drug after standardization processing, includes:
s411: acquiring a first medicine number in the hospital database and acquiring a second medicine number contained in each message queue;
s412: calculating the proportion value of the number of the second medicines in the number of the first medicines;
s413: judging whether each proportional value has a target proportional value smaller than a preset proportional value;
s414: if so, acquiring processing parameter data of the target standardized processing model after processing the data information of the medicines on the corresponding message queue by using the standardized processing model corresponding to the target proportion value; the target standardization processing model is one of other standardization processing models except the standardization processing model corresponding to the target proportion value;
s415: and receiving and processing the data information of the unprocessed medicines in the message queue corresponding to the target standardization processing model according to the acquired processing parameter data of the target standardization processing model.
As described in the foregoing steps S411-S415, the first number, i.e. the total number, of the drugs in the hospital database may be obtained, and then the second number, i.e. the fractional number, of the drugs in each message queue may be divided unreasonably during the pre-classification process of the drugs, so that the number of the drugs in some message queues is particularly large, and the number of the drugs in some message queues is particularly small, so that a part of the standardized processing models may be idle during the calculation process, and the operation capability of the program is not fully utilized, so that the standardized processing models corresponding to the target proportional values smaller than the preset proportional values may be extracted by the second number of the drugs in each message queue, and after the drug processing on the corresponding message queue is completed, the processing parameter data of the target standardized processing model may be obtained to help the target standardized processing model process the data information of the drugs in the message queue that is not processed, the processing mode may be that the target standardization processing model receives and processes the data information of the drugs in the message queue in sequence, the extracted standardization processing model may receive and process the data information of the drugs in the message queue in reverse order until the data information of all the drugs is processed, the target standardization processing model may be any other standardization processing model with a target proportion value that is not smaller than the preset proportion value, and subsequently, the idle standardization processing model may also obtain the processing parameters of the standardization processing model that is processing the data information of the drugs in the message queue, and then help the idle standardization processing model to process the data information of the remaining drugs, thereby increasing the processing efficiency.
In one embodiment, the step S2 of pre-classifying the medicine according to the data information of the medicine includes:
s201: acquiring the category information of the medicine, and vectorizing the category information to obtain a first vector corresponding to the medicine;
s202: according to the formula
Figure BDA0002656230730000091
Calculating the similarity between the first vector and a second vector corresponding to each message queue; wherein, the
Figure BDA0002656230730000092
Represents a first vector, said
Figure BDA0002656230730000093
Representing a second vector;
s203: comparing the similarity with a preset similarity threshold value of each message queue according to the similarity;
s204: and according to the comparison result, the medicine and the data information of the medicine are programmed into the corresponding message queue.
As described in the foregoing steps S201-S204, the pre-classification may be performed by putting the category information of each medicine in stock, because the descriptions of the category information may not be completely consistent, the category information may be vectorized, then calculating the similarity of the second vector corresponding to each message queue by using a formula, and then calculating the similarity threshold of each message queue according to the similarity to compare with the similarity threshold of each message queue. And then the medicines and the data information of the medicines are programmed into the corresponding message queues according to the comparison result, and programming processing is completed.
In an embodiment, the step S201 of obtaining the category information of the drug and vectorizing the category information to obtain a first vector corresponding to the drug includes:
s2011: preprocessing the data information of the medicine; the preprocessing comprises the steps of eliminating punctuation marks, unifying languages and deleting irrelevant words and sentences in the problem according to the special character identification library, wherein the irrelevant words and sentences comprise greetings and adjectives;
s2012: reading text data of a data set according to a BERT Chinese training model, and segmenting the text data in a fine-tuning mode;
s2013: recognizing the text data after word segmentation by a semantic recognition technology, extracting category keywords in the text data, and acquiring the category keywords as the category information.
As described in the above steps S2011-S2013, data information is preprocessed to reduce errors in subsequent calculation through generated vectors, and then text data of a data set is read through a BERT chinese training model, and word segmentation is performed on the text data in a fine-tuning manner, where the BERT chinese training model is trained based on the professional lexicon, and the professional lexicon may also be a generated data set. And then, recognizing the text data through a semantic recognition technology, extracting category keywords in the text data, acquiring the category keywords as category information, and analyzing and calculating the category of the medicine based on the category keywords.
In one embodiment, the step S5 of classifying each drug in the standard database according to the drug information after the standardization process includes:
s501: establishing a TOKEN list and endowing each medicine with a TOKEN label;
s502: acquiring the drug information of each drug after standardization processing, and attaching the drug information after standardization processing to the corresponding TOKEN label to form a drug label;
s503: and inputting the medicine label into a standard database, and classifying according to the medicine label.
As described in the foregoing steps S501-S503, in order to more conveniently divide each medicine in the following process, a TOKEN list may be created, and then each medicine is assigned a TOKEN tag based on the TOKEN list, because the TOKEN tag is unique and is difficult to copy, each medicine is marked by the TOKEN tag, and then the obtained medicine information is attached to the TOKEN tag to create a correspondence relationship between the medicine and the TOKEN tag and the medicine information, and then the medicine may be directly classified according to the TOKEN tag of the medicine.
In an embodiment, before the step S4 of sequentially sending the data information of the drugs on the message queue to the corresponding standardization processing model for standardization processing according to the type of the message queue to obtain the drug information of each drug after standardization processing, the method further includes:
s301: acquiring data information of each medicine of each hospital and sample data of standardized medicine information associated with each medicine;
s302: according to the category information of the message queue, dividing sample data corresponding to each medicine into a plurality of sample data of different categories;
s303: and training the corresponding standardized processing model based on the sample data of different classes.
As described in steps S301 to S303, the data information of each drug in each hospital and the sample data of the corresponding standardized drug information are obtained, then the sample data corresponding to each drug is divided into a plurality of sample data of different categories according to the category information of the message queue, so that different standardized processing models are trained based on the sample data of different categories, so that the standardized processing models are trained to be more suitable for each category, and then the standardized processing models are trained respectively to obtain the trained standardized processing models of different categories.
Referring to fig. 2, the present invention provides a device for classifying a medicine, including:
the data information acquisition module 10 is used for acquiring data information of each medicine in a database of a hospital;
a medicine pre-classification module 20, configured to pre-classify the medicines according to the data information of the medicines;
a medicine compiling module 30, configured to compile data information of the medicine into a message queue of a corresponding category according to a pre-classification result;
the standardization processing module 40 is configured to sequentially send the data information of the drugs on the message queue to a corresponding standardization processing model for standardization processing according to the type of the message queue, so as to obtain drug information of each drug after standardization processing; the standardized processing model is trained on sample data consisting of data information of each medicine collected in advance in each hospital and standardized medicine information associated with the medicine data information;
and the classification module 50 is used for classifying and processing each medicine in the standard database according to the medicine information after the standardization processing.
The data information of each medicine in the hospital database is obtained, because no fixed standard is set, each hospital may have different data recording systems, different recording systems have different medicine data information, the different medicine data information is further processed, and the obtaining mode can be directly extracting the data information of all the medicines contained in the hospital database. The acquired medicine data information is input into a cache database, for example, a REDIS database, the input mode can be input through KAFKA, and the KAFKA is a distributed message publishing and subscribing system, and has high performance and high throughput rate. Therefore, the data information of each medicine in the huge database of the hospital can be quickly input.
The medicines can be pre-classified according to the data information of the medicines, such as category information contained in the data information, or usage information; the pre-classification can be divided into several categories, for example, prescription drugs, non-prescription drugs, controlled drugs, and the like, and also can be traditional Chinese medicines, western medicines, and the like.
Corresponding medicines and data information of the medicines can be programmed into corresponding category message queues according to a pre-classification result, wherein different categories of message queues are used for being sent to corresponding programs for processing, and the programming implementation place can still be implemented in a REDIS database.
According to the type of the message queue, the data information of the medicines on the message queue is sequentially sent to the corresponding standardized processing model, the sending mode can also be sent through KAFKA, and the data information of the medicines on the message queue is sequentially sent to the corresponding standardized processing model according to the type of the message queue, wherein it needs to be explained that the corresponding standardized processing model is also trained according to the sample data of the corresponding type, so that the corresponding standardized processing model has the same processing effect, the complexity of the model is simplified, the size of the model is reduced, and the operation efficiency is faster.
According to the obtained medicine information after standardized processing, the medicines are classified to complete the standardized processing of the medicines, so that a standard database is established in the subsequent working process of self-checking and self-correcting, the medicine information of each medicine has a specific standard, the subsequent checking work is facilitated, and the unified recording of the medicine information is facilitated.
In one embodiment, the normalization processing module 40 includes:
an attribute vectorization sub-module, configured to obtain, according to the data information of each drug carried in the message queue, an attribute corresponding to each drug, perform vectorization on the obtained attribute, and obtain a multidimensional coordinate X ═ of an attribute feature vector (X)1,x2,…,xi,…,xn);
A matching degree calculation module for calculating a matching degree according to a formula
Figure BDA0002656230730000131
Calculating the matching degree between the attribute feature vector and each pre-stored vector; wherein, Y is the multidimensional coordinate of each pre-stored vector in the pre-stored database, and Y is (Y)1,y2,…,yi,…,yn),xiRepresents the coordinate, y, corresponding to the ith attribute in the attribute feature vectoriRepresents the coordinate, s, corresponding to the ith attribute in the pre-stored vectoriThe coefficient is corresponding to the ith attribute;
and the target pre-stored vector endowing module is used for obtaining the target pre-stored vectors corresponding to the medicines according to the matching degree of each pre-stored vector and endowing the medicine information corresponding to the target pre-stored vectors to the corresponding medicines.
Extracting data information corresponding to each medicine carried in each message queue to obtain attributes corresponding to each medicine, vectorizing the corresponding attributes, wherein the attributes can be the names, the categories, the purposes and other attributes of the medicines, and then obtaining multi-dimensional coordinates of attribute feature vectors of the medicines. Calculating the matching degree with each pre-stored vector according to a formula, wherein y in each pre-stored vector is required to be accountediAnd xiCorresponding to the same attribute, the weight ratio of each attribute should be different, so the parameter s is introducediWherein the parameter siIs obtained by implementing training, and the parameter s is different according to iiIs different, it is to be explained that i is different from the parameter siThere is no functional correspondence between them, parameter siThe correlation with the ith property, e.g. the shape of the drug, is weaker and its parameter siThe corresponding is smaller, while the dependency of this property on the application is stronger, its parameter siAnd correspondingly larger. Then, pre-storing the target vector corresponding to each medicine according to the matching degree of each pre-stored vector, and then pre-storing the target vectorAnd giving corresponding medicine information to the corresponding medicine.
In one embodiment, the normalization processing module 40 includes:
the medicine number acquisition submodule is used for acquiring a first medicine number in the hospital database and acquiring a second medicine number contained in each message queue;
the proportion value operator module is used for calculating the proportion value of each second medicine number in the first medicine number;
the target proportion value judgment submodule is used for judging whether each proportion value has a target proportion value smaller than a preset proportion value;
the processing parameter data acquisition submodule is used for acquiring the processing parameter data of the target standardized processing model after the data information of the medicines on the corresponding message queue is processed by the standardized processing model corresponding to the target proportion value if the target proportion value is smaller than the preset proportion value; the target standardization processing model is one of other standardization processing models except the standardization processing model corresponding to the target proportion value;
and the processing submodule is used for receiving and processing the data information of the unprocessed medicines in the message queue corresponding to the target standardization processing model according to the acquired processing parameter data of the target standardization processing model.
The method can acquire the first medicine number, namely the total number, in the hospital database, and then the second medicine number, namely the fractional number, in each message queue, in the pre-classification process of medicines, the classification can be unreasonably divided, so that the medicine number in some message queues is particularly large, and the medicine number in some message queues is particularly small, therefore, in the calculation process, a part of standardized processing models can be idle, the calculation capability of a program is not fully utilized, so that the standardized processing models corresponding to the target proportion values smaller than the preset proportion values can be extracted through the second medicine number in each message queue, and after the medicine processing on the corresponding message queue is completed, the processing parameter data of the target standardized processing model can be acquired, and the target standardized processing model can be helped to process the data information of the medicines in the unprocessed message queues, the processing mode may be that the target standardization processing model receives and processes the data information of the drugs in the message queue in sequence, the extracted standardization processing model may receive and process the data information of the drugs in the message queue in reverse order until the data information of all the drugs is processed, the target standardization processing model may be any other standardization processing model with a target proportion value that is not smaller than the preset proportion value, and subsequently, the idle standardization processing model may also obtain the processing parameters of the standardization processing model that is processing the data information of the drugs in the message queue, and then help the idle standardization processing model to process the data information of the remaining drugs, thereby increasing the processing efficiency.
In one embodiment, the drug pre-sorting module 20 includes:
the category information acquisition submodule is used for acquiring category information of the medicine and vectorizing the category information to obtain a first vector corresponding to the medicine;
a similarity operator module for calculating a similarity according to a formula
Figure BDA0002656230730000151
Figure BDA0002656230730000152
Calculating the similarity between the first vector and a second vector corresponding to each message queue; wherein, the
Figure BDA0002656230730000153
Represents a first vector, said
Figure BDA0002656230730000154
Representing a second vector;
the similarity comparison submodule compares the similarity with a preset similarity threshold value of each message queue according to the similarity;
and the data information compiling module is used for compiling the medicine and the data information of the medicine into the corresponding message queue according to the comparison result.
The pre-classification mode can be the category information of each medicine which is put in stock, the description of the category information may not be completely consistent, so the category information can be vectorized, the similarity of a second vector corresponding to each message queue is calculated through a formula, and then the similarity threshold of each message queue is calculated according to the similarity for comparison. And then the medicines and the data information of the medicines are programmed into the corresponding message queues according to the comparison result, and programming processing is completed.
In one embodiment, the category information acquisition sub-module includes:
the preprocessing unit is used for preprocessing the data information of the medicine; the preprocessing comprises the steps of eliminating punctuation marks, unifying languages and deleting irrelevant words and sentences in the problem according to the special character identification library, wherein the irrelevant words and sentences comprise greetings and adjectives;
the word segmentation unit is used for reading text data of the data set according to the BERT Chinese training model and segmenting the text data in a fine-tuning mode;
and the category keyword extraction unit is used for identifying the text data after word segmentation through a semantic identification technology, extracting category keywords in the text data, and acquiring the category keywords as the category information.
Preprocessing data information to reduce errors of subsequent calculation through generated vectors, reading text data of a data set through a BERT Chinese training model, and segmenting the text data in a fine-tuning mode, wherein the BERT Chinese training model is trained on the basis of the professional word stock, and the professional word stock can also be a generated data set. And then, recognizing the text data through a semantic recognition technology, extracting category keywords in the text data, acquiring the category keywords as category information, and analyzing and calculating the category of the medicine based on the category keywords.
In one embodiment, the categorization module 50, includes:
a TOKEN list establishing submodule for establishing a TOKEN list and assigning a TOKEN label to each medicine;
the drug information acquisition submodule is used for acquiring the drug information of each drug after the standardization processing, and attaching the drug information after the standardization processing to the corresponding TOKEN label to form a drug label;
and the classification processing submodule is used for inputting the medicine label into a standard database and performing classification processing according to the medicine label.
In order to facilitate the subsequent division of each medicine, a TOKEN list may be established, each medicine is assigned a TOKEN label based on the TOKEN list, each medicine is labeled with a TOKEN label because the TOKEN label is unique and difficult to copy, and then the acquired medicine information is attached to the TOKEN label to establish a correspondence between the medicine and the TOKEN label and the medicine information.
In one embodiment, the apparatus for classifying a medicine further includes:
the system comprises a sample data acquisition module, a data analysis module and a data analysis module, wherein the sample data acquisition module is used for acquiring data information of each medicine of each hospital and sample data of standardized medicine information associated with each medicine;
the sample data dividing module is used for dividing the sample data corresponding to each medicine into a plurality of sample data of different types according to the type information of the message queue;
and the standard processing model training module is used for training the corresponding standard processing model based on the sample data of different types.
The method comprises the steps of acquiring data information of each medicine of each hospital and sample data of corresponding standardized medicine information, dividing the sample data corresponding to each medicine into a plurality of sample data of different types according to category information of a message queue, enabling different standardized processing models to be trained on the basis of the sample data of different types, enabling the standardized processing models to be trained more closely to the categories, and then respectively training the standardized processing models to obtain the trained standardized processing models of different types.
The invention has the beneficial effects that: by the medicine classifying method, the early preparation work of self-correction and self-check can be completed, the medicine information of each medicine is standardized, a large amount of labor investment for recoding and comparing each medicine in a hospital database in the self-correction and self-check work is reduced, the time consumption of code matching work is reduced, and errors caused by manual code matching are reduced.
Referring to fig. 3, a computer device, which may be a server and whose internal structure may be as shown in fig. 3, is also provided in the embodiment of the present application. The computer device includes a processor, a memory, a network interface, and a database connected by a system bus. Wherein the computer designed processor is used to provide computational and control capabilities. The memory of the computer device comprises a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, a computer program, and a database. The memory provides an environment for the operation of the operating system and the computer program in the non-volatile storage medium. The database of the computer device is used for storing data information and the like of various medicines. The network interface of the computer device is used for communicating with an external terminal through a network connection. The computer program, when executed by a processor, may implement the method for classifying a drug as described in any of the above embodiments.
Those skilled in the art will appreciate that the architecture shown in fig. 3 is only a block diagram of some of the structures associated with the disclosed aspects and is not intended to limit the computing devices to which the disclosed aspects may be applied.
The embodiments of the present application further provide a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, can implement the method for classifying a drug according to any of the embodiments described above.
It will be understood by those skilled in the art that all or part of the processes of the methods of the above embodiments may be implemented by hardware associated with instructions of a computer program, which may be stored on a non-volatile computer-readable storage medium, and when executed, may include processes of the above embodiments of the methods. Any reference to memory, storage, database, or other medium provided herein and used in the examples may include non-volatile and/or volatile memory. Non-volatile memory can include read-only memory (ROM), Programmable ROM (PROM), Electrically Programmable ROM (EPROM), Electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms, such as Static RAM (SRAM), Dynamic RAM (DRAM), Synchronous DRAM (SDRAM), double-rate SDRAM (SSRSDRAM), Enhanced SDRAM (ESDRAM), Synchronous Link DRAM (SLDRAM), Rambus Direct RAM (RDRAM), direct bus dynamic RAM (DRDRAM), and bus dynamic RAM (RDRAM).
It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, apparatus, article, or method that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, apparatus, article, or method. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, apparatus, article, or method that includes the element.
The above description is only a preferred embodiment of the present invention and is not intended to limit the present invention, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the scope of the claims of the present invention.

Claims (10)

1. A method for classifying a pharmaceutical product, comprising:
acquiring data information of each medicine in a database of a hospital;
pre-classifying the medicines according to the data information of the medicines;
according to the pre-classification result, the data information of the medicine is programmed into a message queue of a corresponding class;
according to the type of the message queue, sequentially sending the data information of the medicines on the message queue to a corresponding standardized processing model for standardized processing to obtain the medicine information of each medicine after standardized processing; the standardized processing model is trained on sample data consisting of data information of each medicine collected in advance in each hospital and standardized medicine information associated with the medicine data information;
and classifying each medicine in the standard database according to the medicine information after the standardization processing.
2. The method for classifying drugs according to claim 1, wherein the step of sequentially sending the data information of the drugs on the message queue to the corresponding standardized processing model for standardized processing according to the type of the message queue to obtain the drug information of each drug after standardized processing comprises:
obtaining the attribute corresponding to each medicine according to the data information of each medicine carried in the message queue, vectorizing the attribute to obtain the multi-dimensional coordinate X ═ X (X) of the attribute feature vector1,x2,…,xi,…,xn);
According to the formula
Figure FDA0002656230720000011
Calculating the matching degree between the attribute feature vector and each pre-stored vector; wherein, Y is the multidimensional coordinate of each pre-stored vector in the pre-stored database, and Y is (Y)1,y2,…,yi,…,yn),xiRepresents the coordinate, y, corresponding to the ith attribute in the attribute feature vectoriRepresents the coordinate, s, corresponding to the ith attribute in the pre-stored vectoriThe coefficient is corresponding to the ith attribute;
and obtaining target pre-stored vectors corresponding to the medicines according to the matching degree of each pre-stored vector, and endowing the medicine information corresponding to the target pre-stored vectors to the corresponding medicines.
3. The method for classifying drugs according to claim 1, wherein the step of sequentially sending the data information of the drugs on the message queue to the corresponding standardized processing model for standardized processing according to the type of the message queue to obtain the drug information of each drug after standardized processing comprises:
acquiring a first medicine number in the hospital database and acquiring a second medicine number contained in each message queue;
calculating the proportion value of the number of the second medicines in the number of the first medicines;
judging whether each proportional value has a target proportional value smaller than a preset proportional value;
if so, acquiring processing parameter data of the target standardized processing model after processing the data information of the medicines on the corresponding message queue by using the standardized processing model corresponding to the target proportion value; the target standardization processing model is one of other standardization processing models except the standardization processing model corresponding to the target proportion value;
and receiving and processing the data information of the unprocessed medicines in the message queue corresponding to the target standardization processing model according to the acquired processing parameter data of the target standardization processing model.
4. The method for classifying a drug according to claim 1, wherein the step of pre-classifying the drug according to the data information of the drug comprises:
acquiring the category information of the medicine, and vectorizing the category information to obtain a first vector corresponding to the medicine;
according to the formula
Figure FDA0002656230720000021
Calculating the similarity between the first vector and a second vector corresponding to each message queue; wherein, the
Figure FDA0002656230720000022
Represents a first vector, said
Figure FDA0002656230720000023
Representing a second vector;
comparing the similarity with a preset similarity threshold value of each message queue according to the similarity;
and according to the comparison result, the medicine and the data information of the medicine are programmed into the corresponding message queue.
5. The method for classifying a drug according to claim 4, wherein the step of obtaining the class information of the drug and vectorizing the class information to obtain the first vector corresponding to the drug comprises:
preprocessing the data information of the medicine; the preprocessing comprises the steps of eliminating punctuation marks, unifying languages and deleting irrelevant words and sentences in the problem according to the special character identification library, wherein the irrelevant words and sentences comprise greetings and adjectives;
reading text data of a data set according to a BERT Chinese training model, and segmenting the text data in a fine-tuning mode;
recognizing the text data after word segmentation by a semantic recognition technology, extracting category keywords in the text data, and acquiring the category keywords as the category information.
6. The method for classifying drugs according to claim 1, wherein the step of classifying each drug in the standard database based on the drug information after the standardization process comprises:
establishing a TOKEN list and endowing each medicine with a TOKEN label;
acquiring the drug information of each drug after standardization processing, and attaching the drug information after standardization processing to the corresponding TOKEN label to form a drug label;
and inputting the medicine label into a standard database, and classifying according to the medicine label.
7. The method for classifying drugs according to claim 1, wherein before the step of sequentially sending the data information of the drugs on the message queue to the corresponding standardized processing model for standardized processing according to the type of the message queue to obtain the drug information of each drug after standardized processing, the method further comprises:
acquiring data information of each medicine of each hospital and sample data of standardized medicine information associated with each medicine;
according to the category information of the message queue, dividing sample data corresponding to each medicine into a plurality of sample data of different categories;
and training the corresponding standardized processing model based on the sample data of different classes.
8. A drug sorting device, comprising:
the data information acquisition module is used for acquiring data information of each medicine in a database of the hospital;
the medicine pre-classification module is used for pre-classifying the medicines according to the data information of the medicines;
the medicine compiling module is used for compiling the data information of the medicines into the message queues of the corresponding categories according to the pre-classification result;
the standardization processing module is used for sequentially sending the data information of the medicines on the message queue to a corresponding standardization processing model for standardization processing according to the type of the message queue to obtain the medicine information of each medicine after standardization processing; the standardized processing model is trained on sample data consisting of data information of each medicine collected in advance in each hospital and standardized medicine information associated with the medicine data information;
and the classification module is used for classifying and processing each medicine in the standard database according to the medicine information after the standardization processing.
9. A computer device comprising a memory and a processor, the memory storing a computer program, wherein the processor implements the steps of the method of any one of claims 1 to 7 when executing the computer program.
10. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method of any one of claims 1 to 7.
CN202010888451.9A 2020-08-28 2020-08-28 Medicine classification method and device and computer equipment Pending CN112035664A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010888451.9A CN112035664A (en) 2020-08-28 2020-08-28 Medicine classification method and device and computer equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010888451.9A CN112035664A (en) 2020-08-28 2020-08-28 Medicine classification method and device and computer equipment

Publications (1)

Publication Number Publication Date
CN112035664A true CN112035664A (en) 2020-12-04

Family

ID=73586823

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010888451.9A Pending CN112035664A (en) 2020-08-28 2020-08-28 Medicine classification method and device and computer equipment

Country Status (1)

Country Link
CN (1) CN112035664A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113569996A (en) * 2021-08-30 2021-10-29 平安医疗健康管理股份有限公司 Method, device, equipment and storage medium for classifying medical record information
CN115359925A (en) * 2022-10-20 2022-11-18 阿里巴巴(中国)有限公司 Medicine collection method, equipment and storage medium

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110781298A (en) * 2019-09-18 2020-02-11 平安科技(深圳)有限公司 Medicine classification method and device, computer equipment and storage medium
WO2020048264A1 (en) * 2018-09-03 2020-03-12 平安医疗健康管理股份有限公司 Method and apparatus for processing drug data, computer device, and storage medium

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2020048264A1 (en) * 2018-09-03 2020-03-12 平安医疗健康管理股份有限公司 Method and apparatus for processing drug data, computer device, and storage medium
CN110781298A (en) * 2019-09-18 2020-02-11 平安科技(深圳)有限公司 Medicine classification method and device, computer equipment and storage medium

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
胡小燕;: "药品属性分类知识库的构建与应用", 中医药管理杂志, no. 15 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113569996A (en) * 2021-08-30 2021-10-29 平安医疗健康管理股份有限公司 Method, device, equipment and storage medium for classifying medical record information
CN113569996B (en) * 2021-08-30 2024-05-07 平安医疗健康管理股份有限公司 Method, device, equipment and storage medium for classifying medical records information
CN115359925A (en) * 2022-10-20 2022-11-18 阿里巴巴(中国)有限公司 Medicine collection method, equipment and storage medium

Similar Documents

Publication Publication Date Title
US7373291B2 (en) Linguistic support for a recognizer of mathematical expressions
CN109800307B (en) Product evaluation analysis method and device, computer equipment and storage medium
CN110021439A (en) Medical data classification method, device and computer equipment based on machine learning
CN110765265A (en) Information classification extraction method and device, computer equipment and storage medium
US9652695B2 (en) Label consistency for image analysis
CN114996463B (en) Intelligent classification method and device for cases
CN111507089B (en) Document classification method and device based on deep learning model and computer equipment
CN110781677B (en) Medicine information matching processing method and device, computer equipment and storage medium
CN112035664A (en) Medicine classification method and device and computer equipment
CN112015878A (en) Method and device for processing unanswered questions of intelligent customer service and computer equipment
CN111860669A (en) Training method and device of OCR recognition model and computer equipment
CN113849648A (en) Classification model training method and device, computer equipment and storage medium
CN111191028A (en) Sample labeling method and device, computer equipment and storage medium
CN112347254A (en) News text classification method and device, computer equipment and storage medium
CN112364163A (en) Log caching method and device and computer equipment
CN114139537A (en) Word vector generation method and device
CN112380848B (en) Text generation method, device, equipment and storage medium
CN112036478A (en) Identification method and device for chronic disease reimbursement medicine and computer equipment
CN112989022B (en) Intelligent virtual text selection method and device and computer equipment
CN116340512A (en) False comment identification method, device, equipment and medium
CN113095073B (en) Corpus tag generation method and device, computer equipment and storage medium
CN111178070B (en) Word sequence obtaining method and device based on word segmentation and computer equipment
CN113408296A (en) Text information extraction method, device and equipment
CN114860894A (en) Method and device for querying knowledge base, computer equipment and storage medium
Armstrong Development and properties of kernel-based methods for the interpretation and presentation of forensic evidence

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right
TA01 Transfer of patent application right

Effective date of registration: 20220525

Address after: 518000 China Aviation Center 2901, No. 1018, Huafu Road, Huahang community, Huaqiang North Street, Futian District, Shenzhen, Guangdong Province

Applicant after: Shenzhen Ping An medical and Health Technology Service Co.,Ltd.

Address before: Room 12G, Block H, 666 Beijing East Road, Huangpu District, Shanghai 200000

Applicant before: PING AN MEDICAL AND HEALTHCARE MANAGEMENT Co.,Ltd.