CN113284577A - Medicine prediction method, device, equipment and storage medium - Google Patents

Medicine prediction method, device, equipment and storage medium Download PDF

Info

Publication number
CN113284577A
CN113284577A CN202110566394.7A CN202110566394A CN113284577A CN 113284577 A CN113284577 A CN 113284577A CN 202110566394 A CN202110566394 A CN 202110566394A CN 113284577 A CN113284577 A CN 113284577A
Authority
CN
China
Prior art keywords
inquiry
data
historical
medicine
information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110566394.7A
Other languages
Chinese (zh)
Other versions
CN113284577B (en
Inventor
吴汉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Kangjian Information Technology Shenzhen Co Ltd
Original Assignee
Kangjian Information Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Kangjian Information Technology Shenzhen Co Ltd filed Critical Kangjian Information Technology Shenzhen Co Ltd
Priority to CN202110566394.7A priority Critical patent/CN113284577B/en
Publication of CN113284577A publication Critical patent/CN113284577A/en
Priority to PCT/CN2022/088787 priority patent/WO2022247549A1/en
Application granted granted Critical
Publication of CN113284577B publication Critical patent/CN113284577B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H20/00ICT specially adapted for therapies or health-improving plans, e.g. for handling prescriptions, for steering therapy or for monitoring patient compliance
    • G16H20/10ICT specially adapted for therapies or health-improving plans, e.g. for handling prescriptions, for steering therapy or for monitoring patient compliance relating to drugs or medications, e.g. for ensuring correct administration to patients
    • G16H20/13ICT specially adapted for therapies or health-improving plans, e.g. for handling prescriptions, for steering therapy or for monitoring patient compliance relating to drugs or medications, e.g. for ensuring correct administration to patients delivered from dispensers
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/213Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods
    • G06F18/2135Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods based on approximation criteria, e.g. principal component analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/20ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for computer-aided diagnosis, e.g. based on medical expert systems
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02ATECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
    • Y02A90/00Technologies having an indirect contribution to adaptation to climate change
    • Y02A90/10Information and communication technologies [ICT] supporting adaptation to climate change, e.g. for weather forecasting or climate simulation

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Health & Medical Sciences (AREA)
  • Theoretical Computer Science (AREA)
  • Public Health (AREA)
  • Medical Informatics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Health & Medical Sciences (AREA)
  • Primary Health Care (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Epidemiology (AREA)
  • Pathology (AREA)
  • Databases & Information Systems (AREA)
  • Chemical & Material Sciences (AREA)
  • Medicinal Chemistry (AREA)
  • Medical Treatment And Welfare Office Work (AREA)

Abstract

The invention relates to the field of artificial intelligence, and discloses a medicine prediction method, a device, equipment and a storage medium, which are used for solving the technical problem of low accuracy when a medicine prediction method predicts a medicine in the prior art. The method comprises the following steps: obtaining a plurality of historical inquiry records in authorized historical inquiry data, and extracting first inquiry characteristics in the historical inquiry records; counting the number of historical inquiry records corresponding to each first inquiry characteristic, and generating distribution data of each first inquiry characteristic in the historical inquiry data; cleaning historical inquiry data, and training a preset deep learning tool according to an inquiry data training set consisting of distributed data to obtain a medicine prediction model; acquiring an inquiry information text based on the medicine prediction request, extracting a second inquiry characteristic according to the inquiry information text, and inputting the second inquiry characteristic into the medicine prediction model for prediction to obtain a medicine prediction result. In addition, the invention also relates to a block chain technology, and the related information of the medicine prediction can be stored in the block chain.

Description

Medicine prediction method, device, equipment and storage medium
Technical Field
The invention relates to the field of artificial intelligence, in particular to a medicine prediction method, a device, equipment and a storage medium.
Background
When a patient is in a doctor, the doctor needs to comprehensively judge the state of the patient according to the description of the state of the patient and the inspection result, and select the medicine for the patient according to the judgment result; with the technical development in the field of artificial intelligence, artificial intelligence is gradually adopted in various industries to assist or replace people to do some simple works, such as: drug prediction is performed based on the interrogation information, and a doctor or a patient may make drug selection based on the predicted result.
However, in the existing medicine prediction method, data learning training is required, and in the current learning training process, the original inquiry data in the data set is not considered to have certain regularity and specificity, but the inquiry data in the data set is directly cleaned, so that the regularity and specificity of the data set are damaged to a certain extent in the data processing process, and the recommendation of a medicine prediction model obtained by training is inaccurate, and the accuracy of medicine prediction is further reduced.
Disclosure of Invention
The invention mainly aims to solve the technical problem that the medicine prediction method in the prior art is low in prediction accuracy.
The invention provides a medicine prediction method in a first aspect, which comprises the following steps: obtaining authorized historical inquiry data and extracting all first inquiry characteristics in the historical inquiry data, wherein the historical inquiry data comprises a plurality of historical inquiry records; counting historical inquiry records in the historical inquiry data according to the first inquiry characteristics to obtain the number of the historical inquiry records corresponding to each first inquiry characteristic, and generating distribution data of the corresponding first inquiry characteristics in the historical inquiry data based on the number; cleaning the historical inquiry records corresponding to the first inquiry characteristics, and forming an inquiry data training set by the cleaned historical inquiry records and the corresponding distribution data; training a preset deep learning tool according to the inquiry data training set to obtain a medicine prediction model; after a medicine prediction request is received, acquiring an inquiry information text corresponding to the medicine prediction request, and extracting a second inquiry characteristic in the inquiry information text; and inputting the second inquiry characteristics into the medicine prediction model to perform medicine prediction, so as to obtain a medicine prediction result corresponding to the second inquiry characteristics.
Optionally, in a first implementation manner of the first aspect of the present invention, the obtaining authorized historical inquiry data and extracting all first inquiry features in the historical inquiry data includes: obtaining a plurality of historical inquiry records in authorized historical inquiry data, and performing format conversion on the historical inquiry records to obtain historical inquiry character string data; extracting inquiry information characteristics and medicine using information in the historical inquiry character string data, and calculating a correlation coefficient between the inquiry information characteristics and the medicine using information; screening out the inquiry information characteristics of which the correlation coefficients meet the preset correlation coefficient conditions to obtain first inquiry characteristics.
Optionally, in a second implementation manner of the first aspect of the present invention, the screening out the inquiry information features whose correlation coefficients satisfy the preset correlation coefficient condition, and obtaining the first inquiry feature includes: sorting the correlation coefficients according to the correlation coefficient values from high to low to obtain a correlation coefficient sequence; and sequentially screening a plurality of inquiry information characteristics in the correlation coefficient sequence according to the sequence of the correlation coefficients, and taking the screened inquiry information characteristics as first inquiry characteristics.
Optionally, in a third implementation manner of the first aspect of the present invention, the performing statistics on historical inquiry records in the historical inquiry data according to the first inquiry features to obtain the number of the historical inquiry records corresponding to each first inquiry feature, and generating distribution data of the corresponding first inquiry features in the historical inquiry data based on the number includes: classifying the historical inquiry records according to the information of the used medicines to obtain a classified inquiry record set; calling a principal component analysis method to analyze the first inquiry features in the classified inquiry record set to obtain the first inquiry features with the maximum correlation in the classified inquiry record set, and marking the first inquiry features with the maximum correlation as main features related to the classified inquiry record; and generating distribution data of the first inquiry characteristics in the historical inquiry data based on the quantity of the historical inquiry records containing each main characteristic in the historical inquiry records.
Optionally, in a fourth implementation manner of the first aspect of the present invention, the cleaning the historical inquiry records corresponding to the first inquiry features, and forming an inquiry data training set by using the cleaned historical inquiry records and the corresponding distribution data thereof includes: performing data primary cleaning on the historical inquiry data, and removing error data to obtain a primary cleaning data set; performing secondary cleaning on the primary cleaning data set, and removing historical inquiry data which do not accord with the distribution data to obtain a secondary cleaning data set; and extracting historical inquiry data in the secondary cleaning data set according to the distribution data, and forming an inquiry data training set by the extracted historical inquiry data.
Optionally, in a fifth implementation manner of the first aspect of the present invention, the performing data primary cleaning on the historical inquiry data, and removing error data to obtain a primary cleaning data set includes: pre-cleaning the historical inquiry data, and removing dirty data to obtain a pre-cleaning data set; and carrying out validity matching cleaning on the pre-cleaning data set, and removing illegal data to obtain a primary cleaning data set.
Optionally, in a sixth implementation manner of the first aspect of the present invention, the performing secondary cleaning on the primary cleaning data set to remove historical inquiry data that does not conform to the distribution data, and obtaining a secondary cleaning data set includes: acquiring the information of the used medicines in the primary cleaning data set, and drawing a box-shaped graph according to the information type of the used medicines and the first inquiry characteristics corresponding to the information type of the medicines; screening historical inquiry data in the primary cleaning data set based on the box type graph to obtain abnormal data, and removing the abnormal data; and forming a secondary cleaning data set by the rest historical inquiry data in the primary cleaning data set.
A second aspect of the present invention provides a medicine prediction apparatus comprising: the system comprises a first characteristic acquisition module, a second characteristic acquisition module and a third characteristic acquisition module, wherein the first characteristic acquisition module is used for acquiring authorized historical inquiry data and extracting all first inquiry characteristics in the historical inquiry data, and the historical inquiry data comprises a plurality of historical inquiry records; the distribution data calculation module is used for counting the historical inquiry records in the historical inquiry data according to the first inquiry characteristics to obtain the number of the historical inquiry records corresponding to each first inquiry characteristic, and generating the distribution data of the corresponding first inquiry characteristics in the historical inquiry data based on the number; the training set construction module is used for cleaning the historical inquiry records corresponding to the first inquiry characteristics and forming an inquiry data training set by the cleaned historical inquiry records and the corresponding distribution data; the training module is used for training a preset deep learning tool according to the inquiry data training set to obtain a medicine prediction model; the second characteristic acquisition module is used for acquiring an inquiry information text corresponding to the medicine prediction request after receiving the medicine prediction request and extracting second inquiry characteristics in the inquiry information text; and the prediction module is used for inputting the second inquiry characteristics into the medicine prediction model to perform medicine prediction so as to obtain a medicine prediction result corresponding to the second inquiry characteristics.
Optionally, in a first implementation manner of the second aspect of the present invention, the first feature obtaining module includes: the character string acquisition unit is used for acquiring a plurality of historical inquiry records in authorized historical inquiry data, and performing format conversion on the historical inquiry records to obtain historical inquiry character string data; a correlation coefficient calculation unit, configured to extract inquiry information features and used medicine information in the historical inquiry character string data, and calculate a correlation coefficient between the inquiry information features and the used medicine information; and the characteristic screening unit is used for screening out the inquiry information characteristics of which the correlation coefficients meet the preset correlation coefficient conditions to obtain first inquiry characteristics.
Optionally, in a second implementation manner of the second aspect of the present invention, the feature screening unit is specifically configured to: sorting the correlation coefficients according to the correlation coefficient values from high to low to obtain a correlation coefficient sequence; and sequentially screening a plurality of inquiry information characteristics in the correlation coefficient sequence according to the sequence of the correlation coefficients, and taking the screened inquiry information characteristics as first inquiry characteristics.
Optionally, in a third implementation manner of the second aspect of the present invention, the distributed data calculation module includes: the data classification unit is used for classifying the historical inquiry records according to the information of the used medicines to obtain a classified inquiry record set; the characteristic analysis unit is used for calling a principal component analysis method to analyze the first inquiry characteristics in the classified inquiry record set to obtain the first inquiry characteristics with the maximum correlation in the classified inquiry record set, and marking the first inquiry characteristics with the maximum correlation as the main characteristics related to the classified inquiry record; and the calculation unit is used for generating distribution data of the first inquiry characteristics in the historical inquiry data based on the quantity of the historical inquiry records containing each main characteristic in the historical inquiry records.
Optionally, in a fourth implementation manner of the second aspect of the present invention, the training set constructing module includes: the primary cleaning unit is used for carrying out primary data cleaning on the historical inquiry data, removing error data and obtaining a primary cleaning data set; the secondary cleaning unit is used for carrying out secondary cleaning on the primary cleaning data set, removing historical inquiry data which do not accord with the distribution data, and obtaining a secondary cleaning data set;
and the training set construction unit is used for extracting historical inquiry data in the secondary cleaning data set according to the distribution data and forming an inquiry data training set by the extracted historical inquiry data.
Optionally, in a fifth implementation manner of the second aspect of the present invention, the data cleansing unit includes: the pre-cleaning subunit is used for pre-cleaning the historical inquiry data, removing dirty data and obtaining a pre-cleaning data set; and the legality cleaning subunit is used for carrying out legality matching cleaning on the pre-cleaning data set, removing illegal data and obtaining a primary cleaning data set.
Optionally, in a sixth implementation manner of the second aspect of the present invention, the secondary cleaning unit includes: the box-type graph drawing subunit is used for acquiring the used medicine information in the primary cleaning data set and drawing a box-type graph according to the type of the used medicine information and the first inquiry characteristic corresponding to the type of the medicine information; an abnormal value removing subunit, configured to screen historical inquiry data in the primary cleaning data set based on the boxed graph to obtain abnormal data, and remove the abnormal data; and the data set construction subunit is used for forming the residual historical inquiry data in the primary cleaning data set into a secondary cleaning data set.
A third aspect of the present invention provides a medication prediction apparatus comprising: a memory and at least one processor, the memory having instructions stored therein; the at least one processor invokes the instructions in the memory to cause the medication prediction device to perform the medication prediction method described above.
A fourth aspect of the present invention provides a computer-readable storage medium having stored therein instructions, which, when run on a computer, cause the computer to execute the above-mentioned drug prediction method.
According to the technical scheme, historical inquiry data are obtained, all first inquiry characteristics in the historical inquiry data are extracted, wherein the historical inquiry data comprise a plurality of historical inquiry records; counting the number of the historical inquiry records corresponding to each first inquiry characteristic in the historical inquiry records, and generating distribution data of the corresponding first inquiry characteristics in the historical inquiry data; cleaning the historical inquiry records corresponding to the first inquiry characteristics, and then forming an inquiry data training set according to the generated distribution data; calling the inquiry data training set to train a preset deep learning tool to obtain a medicine prediction model; after receiving the medicine prediction request, acquiring an inquiry information text and extracting a second inquiry characteristic; and inputting the second inquiry characteristics into the medicine prediction model for medicine prediction to obtain a medicine prediction result corresponding to the second inquiry characteristics. In the embodiment of the invention, when the inquiry data training set for generating the medicine prediction model is subjected to data processing, the data is processed according to the original distribution data in the historical inquiry data, so that the precision of the medicine prediction model in the application is improved, and the accuracy of medicine prediction is improved.
Drawings
FIG. 1 is a schematic diagram of an embodiment of a method for predicting a drug product according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of another embodiment of a method for drug prediction in an embodiment of the present invention;
FIG. 3 is a schematic diagram of another embodiment of a method for drug prediction in an embodiment of the present invention;
FIG. 4 is a schematic diagram of another embodiment of a method for drug prediction in an embodiment of the present invention;
FIG. 5 is a schematic view of a box-type diagram used in an embodiment of the present invention;
FIG. 6 is a schematic diagram of an embodiment of a drug prediction device in accordance with an embodiment of the present invention;
FIG. 7 is a schematic diagram of another embodiment of a medication prediction apparatus in accordance with an embodiment of the present invention;
fig. 8 is a schematic diagram of an embodiment of a medicine prediction apparatus according to an embodiment of the present invention.
Detailed Description
The embodiment of the invention provides a medicine prediction method, a device, equipment and a storage medium, wherein the method specifically comprises the steps of obtaining historical inquiry data and extracting all first inquiry characteristics in the historical inquiry data, wherein the historical inquiry data comprises a plurality of historical inquiry records; counting the number of the historical inquiry records corresponding to each first inquiry characteristic in the historical inquiry records, and generating distribution data of the corresponding first inquiry characteristics in the historical inquiry data; cleaning the historical inquiry records corresponding to the first inquiry characteristics, and then forming an inquiry data training set according to the generated distribution data; calling the inquiry data training set to train a preset deep learning tool to obtain a medicine prediction model; after receiving the medicine prediction request, acquiring an inquiry information text and extracting a second inquiry characteristic; and inputting the second inquiry characteristics into the medicine prediction model for medicine prediction to obtain a medicine prediction result corresponding to the second inquiry characteristics. In the embodiment of the invention, when the inquiry data training set for generating the medicine prediction model is subjected to data processing, the data is processed according to the original distribution data in the historical inquiry data, so that the precision of the medicine prediction model in the application is improved, and the accuracy of medicine prediction is improved.
The terms "first," "second," "third," "fourth," and the like in the description and in the claims, as well as in the drawings, if any, are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It will be appreciated that the data so used may be interchanged under appropriate circumstances such that the embodiments described herein may be practiced otherwise than as specifically illustrated or described herein. Furthermore, the terms "comprises," "comprising," or "having," and any variations thereof, are intended to cover non-exclusive inclusions, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
For ease of understanding, a detailed flow of an embodiment of the present invention is described below, and referring to fig. 1, an embodiment of a method for drug prediction according to an embodiment of the present invention includes:
101. obtaining authorized historical inquiry data and extracting all first inquiry characteristics in the historical inquiry data;
it is to be understood that the executing subject of the present invention may be a medicine prediction device, and may also be a terminal or a server, which is not limited herein. The embodiment of the present invention is described by taking a server as an execution subject.
In this embodiment, the historical inquiry data includes a plurality of historical inquiry records, and the contents in the historical inquiry records on the inquiry platform or the network in this embodiment are extracted by an information extraction tool, where the historical inquiry records on the inquiry platform or the network in this embodiment are information data of which the use permission is obtained after the inquiry party agrees, and the information data includes contents such as patient information, inquiry information, diagnosis results, and medication information of inquiry, where the patient information includes information such as patient age, sex, pregnancy, allergy history, contraindication, and the like; the inquiry information includes information such as the visit department, the chief complaint, and the like. The obtained information data comprises various data types of structuring, semi-structuring and non-structuring, so that the information data are firstly sorted and unified in data format to obtain historical inquiry data.
Because the patient information, the inquiry information and the diagnosis result contained in the historical inquiry data have a certain correlation, and the diagnosis result and the medication information have a certain direct correlation, in this embodiment, a filtering method is used to extract the data features contained in the obtained historical inquiry data, perform correlation scoring according to the data features and the medication information, select the data features with higher correlation according to the correlation scoring, store the data features with higher correlation as the first inquiry features, and extract all the first inquiry features contained in the historical inquiry data.
102. Counting historical inquiry records in the historical inquiry data according to the first inquiry characteristics to obtain the number of the historical inquiry records corresponding to each first inquiry characteristic, and generating distribution data of the corresponding first inquiry characteristics in the historical inquiry data based on the number;
in this embodiment, after the first inquiry features are obtained, the obtained historical inquiry records are searched and screened according to each first inquiry feature, the number of the historical inquiry records containing each first inquiry feature in the historical inquiry records is respectively calculated, and the obtained number information of the multiple historical inquiry records containing different inquiry features forms feature statistical data. And calculating distribution data of each first interrogation feature in the historical interrogation data based on the feature statistical data.
Further, since one piece of historical inquiry data may include a plurality of first inquiry features, in this embodiment, the first inquiry feature having the greatest influence in the obtained plurality of pieces of historical inquiry data may be extracted by using a principal component analysis method, so as to obtain the first inquiry feature representing the greatest correlation in each piece of historical inquiry data, and the first inquiry feature having the greatest correlation is marked as the first inquiry feature of the corresponding piece of historical inquiry data. And then, classifying according to the first inquiry characteristics with the maximum relevance in each piece of historical inquiry data to obtain a plurality of historical inquiry data classification sets. And counting the number of the historical inquiry data contained in the obtained multiple historical inquiry data classification sets to obtain characteristic statistical data. And calculating the distribution data of the first inquiry characteristics according to the characteristic statistical data. For example: the quantity of the acquired historical inquiry data related to the gynecological patients is a, and the quantity of all the historical inquiry data acquired in the previous step is 10a, so that the distribution data of the first inquiry characteristic of gynecological patients can be calculated to be 10%.
103. Cleaning the historical inquiry records corresponding to the first inquiry characteristics, and forming an inquiry data training set by the cleaned historical inquiry records and the corresponding distribution data;
since the historical inquiry records in the historical inquiry data have certain regularity and specificity, if the data are not directly screened according to the regularity and specificity of the data, the structure of the data set can be damaged, in this embodiment, when the historical inquiry data are cleaned, the historical inquiry records are firstly classified according to the first inquiry characteristic, firstly, the classified inquiry record set is respectively cleaned to remove dirty data and noise interference, a cleaned primary cleaning data set is obtained, and then based on the distribution data obtained in the previous step, the cleaning inquiry data are extracted from the primary cleaning data set to form an inquiry data training set according to the corresponding distribution data.
The distribution data contained in the acquired inquiry data training set is the same as the data distribution in the original historical inquiry data set, so that the original regularity and specificity of the distribution data in the inquiry data training set are ensured, the structure of the data set is prevented from being damaged in the data cleaning process, and the classified historical inquiry data are respectively cleaned, for example: the data of a certain medicine corresponding to a certain first inquiry characteristic accounts for 10% of the gynecological patients, but the data of the certain medicine corresponding to the first inquiry characteristic accounts for only 10% of the gynecological patients in the historical inquiry data set, so that the data of the certain medicine corresponding to the first inquiry characteristic accounts for only 1% of the total historical inquiry data set, and if the data are directly screened, the data are removed with a certain probability, and the integrity of the historical inquiry data is damaged.
104. Training a preset deep learning tool according to an inquiry data training set to obtain a medicine prediction model;
after the cleaned inquiry data training set in the previous step is obtained, dividing historical inquiry data in the inquiry data training set into a training set, a testing set and a verification set, wherein the distribution data of the historical inquiry data in the training set, the testing set and the verification set is the same as the distribution data in the inquiry data training set. And training a preset deep learning tool by adopting the training set, the test set and the verification set, wherein the preset deep learning tool comprises a deep learning algorithm, original parameters in the deep learning algorithm are adjusted based on the inquiry data training set to obtain training parameters, and a medicine prediction model is obtained based on the training parameters.
105. After receiving the medicine prediction request, acquiring an inquiry information text corresponding to the medicine prediction request, and extracting second inquiry characteristics in the inquiry information text;
after a medicine prediction model is established, receiving a medicine prediction request, acquiring an inquiry information text corresponding to the medicine prediction request and contained in the medicine prediction request, and extracting a second inquiry characteristic contained in the currently received inquiry information text according to the content in the inquiry information text, wherein the second inquiry characteristic is similar to the content of the first inquiry characteristic extracted in the previous step, namely the inquiry information text is acquired to include information such as patient age, sex, inoculation condition, allergic history, contraindication and the like; the inquiry information comprises information such as a visiting department, a chief complaint content and the like, and the data characteristics obtained by screening in the previous steps are matched according to the obtained information to obtain a second inquiry characteristic contained in the current inquiry information text.
106. And inputting the second inquiry characteristics into the medicine prediction model for medicine prediction to obtain a medicine prediction result corresponding to the second inquiry characteristics.
And inputting the second inquiry characteristics into the constructed medicine prediction model for processing, and outputting a medicine prediction result corresponding to the second inquiry characteristics. And the medicine prediction result is a candidate medicine which is output after being processed by the medicine prediction model according to the medicines used in the historical inquiry data. In addition, after the drug prediction model outputs the candidate drug, a substitute drug which is the same as or highly similar to the candidate drug can be searched in the drug database as a recommended drug output result according to the pre-established drug database based on the obtained candidate drug.
In the embodiment of the invention, when the inquiry data training set for generating the medicine prediction model is subjected to data processing, the data is processed according to the original distribution data in the historical inquiry data, so that the accuracy of the medicine prediction method in the application on medicine prediction is improved.
Referring to fig. 2, another embodiment of a method for predicting a drug according to an embodiment of the present invention includes:
201. obtaining a plurality of historical inquiry records in authorized historical inquiry data, and performing format conversion on the historical inquiry records to obtain historical inquiry character string data;
it is to be understood that the executing subject of the present invention may be a medicine prediction device, and may also be a terminal or a server, which is not limited herein. The embodiment of the present invention is described by taking a server as an execution subject.
In this embodiment, historical inquiry records on the inquiry platform or the network in this embodiment are extracted by an information extraction tool, where the historical inquiry records on the inquiry platform or the network in this embodiment are information data that is obtained after approval of an inquiry party, and the historical inquiry records are encoded according to characters and converted into character string data that can be read by a machine for storage.
202. Extracting inquiry information characteristics and medicine using information in historical inquiry character string data, and calculating a correlation coefficient between the inquiry information characteristics and the medicine using information;
203. screening out inquiry information characteristics of which the correlation coefficients meet preset correlation coefficient conditions to obtain first inquiry characteristics;
extracting the contents of inquiry patient information, inquiry information, diagnosis results, medication information and the like contained in the character string data corresponding to the historical inquiry data obtained in the previous step, wherein the patient information comprises information such as patient age, sex, pregnancy condition, allergy history, contraindication and the like; the inquiry information includes information such as the visit department, the chief complaint, and the like. And extracting the characteristics contained in the information by using a filtering method, obtaining characteristic information of DeoOu inquiry, and storing the information of the used medicines in each character string data. Specifically, when storing the used medicine information, a medicine name information base is acquired in advance, different product names indicating the same kind of medicine are associated with each other, and when acquiring the used medicine information, the inquiry information of the medicines with different product names using the same kind of medicine is used as the inquiry information of the same kind of used medicine.
Because the character string data corresponding to the historical inquiry data has a certain correlation between the patient information, the inquiry information and the diagnosis result, and the diagnosis result and the medication information have a direct correlation to a certain extent, in this embodiment, a univariate feature selection method is used to extract the data features contained in the obtained historical inquiry data, and perform correlation scoring according to the data features and the medication information, and select the data features with higher correlation according to the correlation scoring, and store the data features with higher correlation as the first inquiry features.
Specifically, a correlation coefficient between the inquiry information features and the used medicine information is calculated according to the inquiry information features, the correlation is scored according to the correlation coefficient, N features with the highest score or a certain percentage of features with the highest score are reserved, a common univariate statistical test can be applied to each feature, a false positive rate (Fpr), a false discovery rate (Fdr) or a family error (Fwe) is counted, so that inquiry information features meeting a correlation coefficient threshold value are selected, and inquiry information features meeting the correlation coefficient threshold value are stored as first inquiry features.
204. Counting historical inquiry records in the historical inquiry data according to the first inquiry characteristics to obtain the number of the historical inquiry records corresponding to each first inquiry characteristic, and generating distribution data of the corresponding first inquiry characteristics in the historical inquiry data based on the number;
the specific content in this step is substantially the same as that in step 102 in the previous embodiment, and therefore, the detailed description thereof is omitted.
205. Performing data primary cleaning on historical inquiry data, and removing error data to obtain a primary cleaning data set;
firstly, cleaning historical inquiry data once to remove error data, wherein the cleaning in the step is mainly to clean the data in a large scale by using a data cleaning technology, and specifically, firstly, cleaning and removing the error data in the historical inquiry data, for example, the data which is not enough to be used as training data and has quality problems, wherein the data can be expressed as data in which an inquiry dialogue is not interrupted completely; lack of necessary characteristic items such as data of age sex and prescription result, etc.; data that are significantly abnormal, such as data that are significantly older than normal; data for obvious unreasonable situations such as a male 40 year old department showing pediatric, etc.; or significantly repeated data. After these error data are taken out, a primary cleaning data set is obtained.
206. Performing secondary cleaning on the primary cleaning data set, and removing historical inquiry data which do not accord with the distribution data to obtain a secondary cleaning data set;
and calling a data distribution analysis method to analyze the data distribution characteristics of the primary cleaning data acquired in the previous step, and removing extreme values in the data to obtain a secondary cleaning data set. For example, urological data of a 99 year old male appearing once belongs to a special minimum batch of data, and the special minimum batch of data is deleted.
207. Extracting historical inquiry data in the secondary cleaning data set according to the distribution data, and forming an inquiry data training set by the extracted historical inquiry data;
extracting historical inquiry data in the secondary cleaning data set, specifically, extracting the secondary cleaning data set according to distribution data during extraction, and forming an inquiry data training set by the extracted historical inquiry data, wherein the distribution data in the obtained inquiry data training set is the same as the original distribution data of the first inquiry characteristic obtained in the previous step, so that the regularity and the particularity in the original historical inquiry data set are reserved in the inquiry data training set obtained in the embodiment, and the training effect of the prediction model is better.
208. Training a preset deep learning tool according to an inquiry data training set to obtain a medicine prediction model;
209. after receiving the medicine prediction request, acquiring an inquiry information text corresponding to the medicine prediction request, and extracting second inquiry characteristics in the inquiry information text;
210. and inputting the second inquiry characteristics into the medicine prediction model for medicine prediction to obtain a medicine prediction result corresponding to the second inquiry characteristics.
The specific contents in steps 208, 209 and step 210 are substantially the same as those in steps 104, 105 and 106 in the foregoing embodiment, and therefore, the details are not repeated herein.
In the embodiment of the invention, when the inquiry data training set for generating the medicine prediction model is subjected to data processing, the inquiry characteristics in the historical inquiry data are firstly obtained, the original distribution data in the historical inquiry data are calculated according to the inquiry characteristics, and the inquiry data training set is generated according to the distribution data so as to obtain the medicine prediction model.
Referring to fig. 3, another embodiment of a method for predicting a drug according to an embodiment of the present invention includes:
301. obtaining authorized historical inquiry data and extracting all first inquiry characteristics in the historical inquiry data;
the specific content in this step is substantially the same as that in step 101 in the previous embodiment, and therefore, the detailed description thereof is omitted.
302. Classifying the historical inquiry records according to the information of the used medicines to obtain a classified inquiry record set;
specifically, when the used medicine information is stored, a medicine name information base is acquired in advance, different product names indicating the same kind of medicine are associated with each other, and when the used medicine information is acquired, the inquiry information of the medicines with different product names using the same kind of medicine is used as the historical inquiry record of the same kind of used medicine. And classifying the historical inquiry records according to the information of the used medicines to obtain a plurality of classified inquiry record sets.
303. Calling a principal component analysis method to analyze the first inquiry features in the classified inquiry record set to obtain the first inquiry features with the maximum correlation in the classified inquiry record set, and marking the first inquiry features with the maximum correlation as main features related to the classified inquiry record;
in the step, a plurality of classified inquiry record sets are included, one classified inquiry record set includes a plurality of first inquiry features, the first inquiry features include a plurality of kinds related to patient information, inquiry information and the like, in order to calculate the data set, a principal component analysis method is called to analyze the first inquiry features in the classified historical inquiry data set, and a plurality of first inquiry features with the maximum relevance in the classified inquiry record set are selected as the first inquiry features of the current classified inquiry record set to label the classified inquiry record set. Among them, Principal Component Analysis (PCA) is a statistical method. A group of variables possibly having correlation are converted into a group of linearly uncorrelated variables through orthogonal transformation, the group of converted variables are called principal components, and in the embodiment, a plurality of first inquiry characteristics with the maximum correlation in a data set are selected as main characteristics.
304. Generating distribution data of the first inquiry characteristics in the historical inquiry data based on the quantity of the historical inquiry records containing each main characteristic in the historical inquiry records;
after the main features are obtained, counting the number of the historical inquiry records containing each main feature in the historical inquiry data, taking the obtained statistical result as feature statistical data, and calling a linear regression analysis method to calculate the feature statistical data to obtain the distribution data of each main feature in the historical inquiry data, such as: the quantity of the acquired historical inquiry data related to the gynecological patients is a, and the quantity of all the historical inquiry data acquired in the previous step is 10a, so that the distribution data of the first inquiry characteristic of gynecological patients can be calculated to be 10%.
305. Performing data primary cleaning on historical inquiry data, and removing error data to obtain a primary cleaning data set;
306. performing secondary cleaning on the primary cleaning data set, and removing historical inquiry data which do not accord with the distribution data to obtain a secondary cleaning data set;
307. extracting historical inquiry data in the secondary cleaning data set according to the distribution data, and forming an inquiry data training set by the extracted historical inquiry data;
the specific contents of steps 305, 306 and 307 are substantially the same as those of steps 205, 206 and 207 in the previous embodiment, and therefore are not described herein again,
308. training a preset deep learning tool according to an inquiry data training set to obtain a medicine prediction model;
309. after receiving the medicine prediction request, acquiring an inquiry information text corresponding to the medicine prediction request, and extracting second inquiry characteristics in the inquiry information text;
310. and inputting the second inquiry characteristics into the medicine prediction model for medicine prediction to obtain a medicine prediction result corresponding to the second inquiry characteristics.
The specific contents in steps 308, 309 and 310 are substantially the same as those in steps 104, 105 and 106 in the foregoing embodiment, and therefore, the detailed description thereof is omitted here.
In the embodiment of the invention, when the inquiry data training set for generating the medicine prediction model is subjected to data processing, the inquiry characteristics in the historical inquiry data are firstly obtained, the original distribution data in the historical inquiry data are calculated according to the inquiry characteristics, and the inquiry data training set is generated according to the distribution data so as to obtain the medicine prediction model.
Referring to fig. 4 and 5, another embodiment of a method for predicting a drug according to an embodiment of the present invention includes:
401. obtaining a plurality of historical inquiry records in authorized historical inquiry data, and performing format conversion on the historical inquiry records to obtain historical inquiry character string data;
the specific content in this step is substantially the same as that in step 201 in the previous embodiment, and therefore, the detailed description thereof is omitted.
402. Extracting inquiry information characteristics and medicine using information in historical inquiry character string data, and calculating a correlation coefficient between the inquiry information characteristics and the medicine using information;
403. sorting the correlation numbers according to the correlation coefficient values from high to low to obtain a correlation coefficient sequence;
404. sequentially screening a plurality of inquiry information characteristics in the correlation coefficient sequence according to the sequence of the correlation coefficients, and taking the screened inquiry information characteristics as first inquiry characteristics;
extracting the contents of inquiry patient information, inquiry information, diagnosis results, medication information and the like contained in the character string data corresponding to the historical inquiry data obtained in the previous step, wherein the patient information comprises information such as patient age, sex, pregnancy condition, allergy history, contraindication and the like; the inquiry information includes information such as the visit department, the chief complaint, and the like. And extracting the characteristics contained in the information by using a filtering method, obtaining characteristic information of DeoOu inquiry, and storing the information of the used medicines in each character string data. Specifically, when storing the used medicine information, a medicine name information base is acquired in advance, different product names indicating the same kind of medicine are associated with each other, and when acquiring the used medicine information, the inquiry information of the medicines with different product names using the same kind of medicine is used as the inquiry information of the same kind of used medicine.
Because the patient information, the inquiry information and the diagnosis result contained in the historical inquiry character string data have a certain correlation, and the diagnosis result and the medication information have a certain direct correlation, in this embodiment, a univariate feature selection method is used to extract the data features contained in the obtained historical inquiry character string data, and perform correlation scoring according to the data features and the medication information, and select the data features with higher correlation according to the correlation scoring, and store the data features with higher correlation as the first inquiry feature.
Specifically, a correlation coefficient between the inquiry information characteristics and the medicine using information is calculated according to the inquiry information characteristics, and the correlation is scored according to the correlation coefficient to obtain a correlation coefficient score value. And sorting the inquiry information characteristics and the used medicine information according to the score value of the correlation coefficient from high to low according to the score value of the correlation coefficient to obtain a correlation coefficient sequence.
And after the correlation coefficient sequence is obtained, selecting at least one inquiry information characteristic in the correlation coefficient sequence according to the sorting condition of the correlation coefficients and the sorting sequence. Specifically, the first N features of the correlation coefficient sequence or the inquiry information features occupying the first M% of the correlation coefficient sequence may be retained, and the screened inquiry information features may be used as the first inquiry features.
In addition, a common univariate statistical test may be applied to each feature to count a false positive rate (Fpr), a false discovery rate (Fdr), or a family error (Fwe), so as to select an inquiry information feature that meets a correlation coefficient threshold, and store the inquiry information feature that meets the correlation coefficient threshold as the first inquiry feature.
405. Classifying the historical inquiry records according to the information of the used medicines to obtain a classified inquiry record set;
406. calling a principal component analysis method to analyze the first inquiry features in the classified inquiry record set to obtain the first inquiry features with the maximum correlation in the classified inquiry record set, and marking the first inquiry features with the maximum correlation as main features related to the classified inquiry record;
407. generating distribution data of the first inquiry characteristics in the historical inquiry data based on the quantity of the historical inquiry records containing each main characteristic in the historical inquiry records;
the specific contents in steps 405, 406, and 407 are substantially the same as those in steps 302, 303, and 304 in the foregoing embodiment, and therefore, the details are not repeated herein.
408. Pre-cleaning historical inquiry data, removing dirty data and obtaining a pre-cleaning data set;
the data cleaning technology is used for cleaning the data in a large scale, the data containing quality problems in historical inquiry data are firstly cleaned and removed to obtain primary cleaning data, for example, the data which are not enough to be used as training data and have quality problems, wherein the primary cleaning data can be expressed as data with interrupted inquiry dialogue; lack of necessary characteristic items such as data of age sex and prescription result, etc.; data that are significantly abnormal, such as data that are significantly older than normal; there are significant errors such as data shown as pediatric by the male 40 year old department.
409. Carrying out legality matching cleaning on the pre-cleaning data set, and removing illegal data to obtain a primary cleaning data set;
after the primary cleaning data is obtained, performing regular matching on the primary cleaning data set by using the obtained first inquiry characteristic, specifically, pre-establishing a regular expression for performing legal matching in the step, calling the regular expression to filter the character strings of the primary cleaning data obtained in the previous step, and removing unnecessary characters to obtain a cleaned historical inquiry data set.
410. Acquiring the information of the used medicines in the primary cleaning data set, and drawing a box-type graph according to the information type of the used medicines and the first inquiry characteristics corresponding to the information type of the medicines;
referring to fig. 5, the used drug information in the cleaning inquiry data set is obtained, each drug information type is used as a numerical axis, and the corresponding first inquiry characteristics are plotted into a box-type graph. The Box plot (Box-plot), also called Box whisker plot, Box plot or Box plot, is a statistical plot used to display a set of data dispersion data, mainly used to reflect the characteristics of the original data distribution, and also can be used to compare multiple sets of data distribution characteristics. The box line graph drawing method comprises the following steps: firstly, finding out an upper edge, a lower edge, a median and two quartiles of a group of data; then, connecting the two quartiles to draw a box body; and connecting the upper edge and the lower edge with the box body, wherein the median is positioned in the middle of the box body. In the step, a box-type diagram corresponding to the cleaning inquiry data set is drawn through the steps.
411. Screening historical inquiry data in the primary cleaning data set based on a box type graph to obtain abnormal data, and removing the abnormal data;
412. forming a secondary cleaning data set by the residual historical inquiry data in the primary cleaning data set;
with continued reference to FIG. 5, after the boxed graph is obtained, data outliers are filtered based on the content of the boxed graph, and in particular, the outliers are defined to be less than Q1-1.5 IQR or greater than Q3+1.5 value of IQR, where Q3And Q1And respectively representing the upper quartile and the lower quartile of the data batch, representing the IQR (intensity response) by a quartile distance, removing abnormal values of the data, and forming a secondary cleaning data set by the residual cleaning inquiry data. The scheme in the embodiment can achieve the purpose of removing noise and interference of abnormal values (outliers) by screening and removing the data abnormal values, for example, 99-year-old male urological data which appears only once belongs to special minimum batch data, and the data abnormal values are removed to improve the accuracy of subsequent model prediction to a certain extent.
413. Extracting historical inquiry data in the secondary cleaning data set according to the distribution data, and forming an inquiry data training set by the extracted historical inquiry data;
the specific content in this step is substantially the same as that in step 207 of the previous embodiment, and therefore, the detailed description thereof is omitted here.
414. Training a preset deep learning tool according to an inquiry data training set to obtain a medicine prediction model;
415. after receiving the medicine prediction request, acquiring an inquiry information text corresponding to the medicine prediction request, and extracting second inquiry characteristics in the inquiry information text;
416. and inputting the second inquiry characteristics into the medicine prediction model for medicine prediction to obtain a medicine prediction result corresponding to the second inquiry characteristics.
The specific contents in steps 414, 415 and 416 are substantially the same as those in steps 104, 105 and 106 in the previous embodiment, and therefore, the detailed description thereof is omitted here.
In the embodiment of the invention, when the inquiry data training set for generating the medicine prediction model is subjected to data processing, firstly, the distribution data according to the historical inquiry data is calculated according to the inquiry characteristics in the historical inquiry data, and after the historical inquiry data is cleaned and screened, the inquiry data training set is generated according to the obtained distribution data of the historical inquiry data, so that the regularity and the particularity of the original historical inquiry data are retained, and the accuracy of the medicine prediction by the medicine prediction method in the application is improved.
With reference to fig. 6, the medicine prediction method in the embodiment of the present invention is described above, and a medicine prediction device in the embodiment of the present invention is described below, where an embodiment of the medicine prediction device in the embodiment of the present invention includes:
a first feature obtaining module 601, configured to obtain authorized historical inquiry data and extract all first inquiry features in the historical inquiry data, where the historical inquiry data includes multiple historical inquiry records;
a distribution data calculation module 602, configured to count historical inquiry records in the historical inquiry data according to the first inquiry features to obtain the number of the historical inquiry records corresponding to each first inquiry feature, and generate distribution data of the corresponding first inquiry features in the historical inquiry data based on the number;
a training set constructing module 603, configured to clean the historical inquiry records corresponding to each first inquiry feature, and form an inquiry data training set by using the cleaned historical inquiry records and the corresponding distribution data;
a training module 604, configured to train a preset deep learning tool according to the inquiry data training set to obtain a drug prediction model;
a second feature obtaining module 605, configured to obtain, after receiving a drug prediction request, an inquiry information text corresponding to the drug prediction request, and extract a second inquiry feature in the inquiry information text;
and the prediction module 606 is configured to input the second inquiry characteristics into the drug prediction model to perform drug prediction, so as to obtain a drug prediction result corresponding to the second inquiry characteristics.
In the embodiment of the invention, when the inquiry data training set for generating the medicine prediction model is subjected to data processing, the data is processed according to the original distribution data in the historical inquiry data, so that the accuracy of the medicine prediction device in the application on medicine prediction is improved.
Referring to fig. 7, another embodiment of a medicine prediction apparatus according to an embodiment of the present invention includes:
a first feature obtaining module 601, configured to obtain authorized historical inquiry data and extract all first inquiry features in the historical inquiry data, where the historical inquiry data includes multiple historical inquiry records;
a distribution data calculation module 602, configured to count historical inquiry records in the historical inquiry data according to the first inquiry features to obtain the number of the historical inquiry records corresponding to each first inquiry feature, and generate distribution data of the corresponding first inquiry features in the historical inquiry data based on the number;
a training set constructing module 603, configured to clean the historical inquiry records corresponding to each first inquiry feature, and form an inquiry data training set by using the cleaned historical inquiry records and the corresponding distribution data;
a training module 604, configured to train a preset deep learning tool according to the inquiry data training set to obtain a drug prediction model;
a second feature obtaining module 605, configured to obtain, after receiving a drug prediction request, an inquiry information text corresponding to the drug prediction request, and extract a second inquiry feature in the inquiry information text;
and the prediction module 606 is configured to input the second inquiry characteristics into the drug prediction model to perform drug prediction, so as to obtain a drug prediction result corresponding to the second inquiry characteristics.
Optionally, the first feature obtaining module 601 includes:
a character string obtaining unit 6011, configured to obtain multiple historical inquiry records in authorized historical inquiry data, and perform format conversion on the historical inquiry records to obtain historical inquiry character string data;
a correlation coefficient calculation unit 6012, configured to extract an inquiry information feature and used drug information in the historical inquiry character string data, and calculate a correlation coefficient between the inquiry information feature and the used drug information;
the feature screening unit 6013 is configured to screen out an inquiry information feature with a correlation coefficient satisfying a preset correlation coefficient condition, so as to obtain a first inquiry feature.
Optionally, the feature screening unit 6013 is specifically configured to:
sorting the correlation coefficients according to the correlation coefficient values from high to low to obtain a correlation coefficient sequence;
and sequentially screening a plurality of inquiry information characteristics in the correlation coefficient sequence according to the sequence of the correlation coefficients, and taking the screened inquiry information characteristics as first inquiry characteristics.
Optionally, the distributed data calculating module 602 includes:
a data classification unit 6021, configured to classify the historical inquiry records according to the information of the used drugs, so as to obtain a classified inquiry record set;
a feature analysis unit 6022, configured to invoke a principal component analysis method to analyze the first inquiry features in the classified inquiry record set, so as to obtain the first inquiry features with the maximum correlation in the classified inquiry record set, and mark the first inquiry features with the maximum correlation as main features related to the classified inquiry record;
a calculating unit 6023, configured to generate distribution data of the first interrogation feature in the historical interrogation data based on the number of the historical interrogation records containing each of the main features.
Optionally, the training set constructing module 603 includes:
a primary cleaning unit 6031, configured to perform primary data cleaning on the historical inquiry data, and remove error data to obtain a primary cleaning data set;
a secondary cleaning unit 6032, configured to perform secondary cleaning on the primary cleaning data set, remove historical inquiry data that does not conform to the distribution data, and obtain a secondary cleaning data set;
a training set constructing unit 6033, configured to extract historical inquiry data in the secondary cleaning data set according to the distribution data, and compose the extracted historical inquiry data into an inquiry data training set.
Optionally, the primary cleaning unit 6031 includes:
the pre-cleaning subunit is used for pre-cleaning the historical inquiry data, removing dirty data and obtaining a pre-cleaning data set;
and the legality cleaning subunit is used for carrying out legality matching cleaning on the pre-cleaning data set, removing illegal data and obtaining a primary cleaning data set.
Optionally, the secondary cleaning unit 6032 includes:
the box-type graph drawing subunit is used for acquiring the used medicine information in the primary cleaning data set and drawing a box-type graph according to the type of the used medicine information and the first inquiry characteristic corresponding to the type of the medicine information;
an abnormal value removing subunit, configured to screen historical inquiry data in the primary cleaning data set based on the boxed graph to obtain abnormal data, and remove the abnormal data;
and the data set construction subunit is used for forming the residual historical inquiry data in the primary cleaning data set into a secondary cleaning data set.
In the embodiment of the invention, when the inquiry data training set for generating the medicine prediction model is subjected to data processing, firstly, the distribution data according to the historical inquiry data is calculated according to the inquiry characteristics in the historical inquiry data, after the historical inquiry data is cleaned and screened, the inquiry data training set is generated according to the obtained distribution data of the historical inquiry data, the regularity and the particularity of the original historical inquiry data are reserved, and the accuracy of the medicine prediction device in the application on medicine prediction is improved.
Fig. 6 and 7 describe the medicine prediction apparatus in the embodiment of the present invention in detail from the perspective of the modular functional entity, and the medicine prediction device in the embodiment of the present invention is described in detail from the perspective of the hardware processing.
Fig. 8 is a schematic diagram of a medicine prediction device 800 according to an embodiment of the present invention, which may have a relatively large difference due to different configurations or performances, and may include one or more processors (CPUs) 810 (e.g., one or more processors) and a memory 820, and one or more storage media 830 (e.g., one or more mass storage devices) storing an application 833 or data 832. Memory 820 and storage medium 830 may be, among other things, transient or persistent storage. The program stored on storage medium 830 may include one or more modules (not shown), each of which may include a sequence of instructions operating on drug prediction device 800. Still further, processor 810 may be configured to communicate with storage medium 830 to execute a series of instruction operations in storage medium 830 on drug prediction device 800.
The medication prediction device 800 may also include one or more power supplies 840, one or more wired or wireless network interfaces 860, one or more input-output interfaces 860, and/or one or more operating systems 831, such as Windows Server, Mac OS X, Unix, Linux, FreeBSD, etc. Those skilled in the art will appreciate that the configuration of the medication prediction device illustrated in fig. 8 does not constitute a limitation of the medication prediction device and may include more or fewer components than those illustrated, or some components may be combined, or a different arrangement of components.
The present invention also provides a drug prediction device comprising a memory and a processor, wherein the memory stores computer readable instructions, and the computer readable instructions, when executed by the processor, cause the processor to execute the steps of the drug prediction method in the above embodiments.
The block chain is a novel application mode of computer technologies such as distributed data storage, point-to-point transmission, a consensus mechanism, an encryption algorithm and the like. A block chain (Blockchain), which is essentially a decentralized database, is a series of data blocks associated by using a cryptographic method, and each data block contains information of a batch of network transactions, so as to verify the validity (anti-counterfeiting) of the information and generate a next block. The blockchain may include a blockchain underlying platform, a platform product service layer, an application service layer, and the like.
The present invention also provides a computer readable storage medium, which may be a non-volatile computer readable storage medium, and which may also be a volatile computer readable storage medium, having stored therein instructions, which, when executed on a computer, cause the computer to perform the steps of the drug prediction method.
It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described systems, apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.
The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a read-only memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.
The above-mentioned embodiments are only used for illustrating the technical solutions of the present invention, and not for limiting the same; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims (10)

1. A method for predicting a drug, the method comprising:
obtaining authorized historical inquiry data and extracting all first inquiry characteristics in the historical inquiry data, wherein the historical inquiry data comprises a plurality of historical inquiry records;
counting historical inquiry records in the historical inquiry data according to the first inquiry characteristics to obtain the number of the historical inquiry records corresponding to each first inquiry characteristic, and generating distribution data of the corresponding first inquiry characteristics in the historical inquiry data based on the number;
cleaning the historical inquiry records corresponding to the first inquiry characteristics, and forming an inquiry data training set by the cleaned historical inquiry records and the corresponding distribution data;
training a preset deep learning tool according to the inquiry data training set to obtain a medicine prediction model;
after a medicine prediction request is received, acquiring an inquiry information text corresponding to the medicine prediction request, and extracting a second inquiry characteristic in the inquiry information text;
and inputting the second inquiry characteristics into the medicine prediction model to perform medicine prediction, so as to obtain a medicine prediction result corresponding to the second inquiry characteristics.
2. The drug prediction method of claim 1, wherein the obtaining authorized historical interrogation data and extracting all first interrogation features in the historical interrogation data comprises:
obtaining a plurality of historical inquiry records in authorized historical inquiry data, and performing format conversion on the historical inquiry records to obtain historical inquiry character string data;
extracting inquiry information characteristics and medicine using information in the historical inquiry character string data, and calculating a correlation coefficient between the inquiry information characteristics and the medicine using information;
screening out the inquiry information characteristics of which the correlation coefficients meet the preset correlation coefficient conditions to obtain first inquiry characteristics.
3. The drug prediction method of claim 2, wherein the screening out the inquiry information features with correlation coefficients satisfying a preset correlation coefficient condition to obtain the first inquiry feature comprises:
sorting the correlation coefficients according to the correlation coefficient values from high to low to obtain a correlation coefficient sequence;
and sequentially screening a plurality of inquiry information characteristics in the correlation coefficient sequence according to the sequence of the correlation coefficients, and taking the screened inquiry information characteristics as first inquiry characteristics.
4. The drug prediction method according to claim 2 or 3, wherein the step of counting the historical inquiry records in the historical inquiry data according to the first inquiry features to obtain the number of the historical inquiry records corresponding to each first inquiry feature, and the step of generating the distribution data of the corresponding first inquiry features in the historical inquiry data based on the number comprises the steps of:
classifying the historical inquiry records according to the information of the used medicines to obtain a classified inquiry record set;
calling a principal component analysis method to analyze the first inquiry features in the classified inquiry record set to obtain the first inquiry features with the maximum correlation in the classified inquiry record set, and marking the first inquiry features with the maximum correlation as main features related to the classified inquiry record;
and generating distribution data of the first inquiry characteristics in the historical inquiry data based on the quantity of the historical inquiry records containing each main characteristic in the historical inquiry records.
5. The method for predicting drugs according to claim 1, wherein the step of cleaning the historical inquiry records corresponding to the first inquiry features and the step of forming an inquiry data training set by using the cleaned historical inquiry records and the corresponding distribution data thereof comprises:
performing data primary cleaning on the historical inquiry data, and removing error data to obtain a primary cleaning data set;
performing secondary cleaning on the primary cleaning data set, and removing historical inquiry data which do not accord with the distribution data to obtain a secondary cleaning data set;
and extracting historical inquiry data in the secondary cleaning data set according to the distribution data, and forming an inquiry data training set by the extracted historical inquiry data.
6. The drug prediction method of claim 5, wherein the performing a data primary cleaning on the historical interrogation data to remove erroneous data to obtain a primary cleaning data set comprises:
pre-cleaning the historical inquiry data, and removing dirty data to obtain a pre-cleaning data set;
and carrying out validity matching cleaning on the pre-cleaning data set, and removing illegal data to obtain a primary cleaning data set.
7. The drug prediction method of claim 5 or 6, wherein the performing a secondary cleaning of the primary cleaning dataset to remove historical interrogation data that does not comply with the distribution data, and obtaining a secondary cleaning dataset comprises:
acquiring the information of the used medicines in the primary cleaning data set, and drawing a box-shaped graph according to the information type of the used medicines and the first inquiry characteristics corresponding to the information type of the medicines;
screening historical inquiry data in the primary cleaning data set based on the box type graph to obtain abnormal data, and removing the abnormal data;
and forming a secondary cleaning data set by the rest historical inquiry data in the primary cleaning data set.
8. A medication prediction apparatus, characterized in that the medication prediction apparatus comprises:
the system comprises a first characteristic acquisition module, a second characteristic acquisition module and a third characteristic acquisition module, wherein the first characteristic acquisition module is used for acquiring authorized historical inquiry data and extracting all first inquiry characteristics in the historical inquiry data, and the historical inquiry data comprises a plurality of historical inquiry records;
the distribution data calculation module is used for counting the historical inquiry records in the historical inquiry data according to the first inquiry characteristics to obtain the number of the historical inquiry records corresponding to each first inquiry characteristic, and generating the distribution data of the corresponding first inquiry characteristics in the historical inquiry data based on the number;
the training set construction module is used for cleaning the historical inquiry records corresponding to the first inquiry characteristics and forming an inquiry data training set by the cleaned historical inquiry records and the corresponding distribution data;
the training module is used for training a preset deep learning tool according to the inquiry data training set to obtain a medicine prediction model;
the second characteristic acquisition module is used for acquiring an inquiry information text corresponding to the medicine prediction request after receiving the medicine prediction request and extracting second inquiry characteristics in the inquiry information text;
and the prediction module is used for inputting the second inquiry characteristics into the medicine prediction model to perform medicine prediction so as to obtain a medicine prediction result corresponding to the second inquiry characteristics.
9. A medication prediction apparatus, characterized in that it comprises: a memory and at least one processor, the memory having instructions stored therein;
the at least one processor invokes the instructions in the memory to cause the medication prediction device to perform the steps of the medication prediction method of any of claims 1-7.
10. A computer readable storage medium having instructions stored thereon, wherein the instructions, when executed by a processor, implement the steps of the drug prediction method according to any one of claims 1-7.
CN202110566394.7A 2021-05-24 2021-05-24 Medicine prediction method, device, equipment and storage medium Active CN113284577B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202110566394.7A CN113284577B (en) 2021-05-24 2021-05-24 Medicine prediction method, device, equipment and storage medium
PCT/CN2022/088787 WO2022247549A1 (en) 2021-05-24 2022-04-24 Drug prediction method, apparatus and device, and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110566394.7A CN113284577B (en) 2021-05-24 2021-05-24 Medicine prediction method, device, equipment and storage medium

Publications (2)

Publication Number Publication Date
CN113284577A true CN113284577A (en) 2021-08-20
CN113284577B CN113284577B (en) 2023-08-11

Family

ID=77281166

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110566394.7A Active CN113284577B (en) 2021-05-24 2021-05-24 Medicine prediction method, device, equipment and storage medium

Country Status (2)

Country Link
CN (1) CN113284577B (en)
WO (1) WO2022247549A1 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113688329A (en) * 2021-08-25 2021-11-23 平安国际智慧城市科技股份有限公司 Information pushing method, device, equipment and storage medium based on medical service
WO2022247549A1 (en) * 2021-05-24 2022-12-01 康键信息技术(深圳)有限公司 Drug prediction method, apparatus and device, and storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108597603A (en) * 2018-05-04 2018-09-28 吉林大学 Cancer return forecasting system based on Multi-dimensional Gaussian distribution Bayes's classification
JP2020021371A (en) * 2018-08-02 2020-02-06 Necソリューションイノベータ株式会社 Post-operation infection predicting apparatus, method of producing post-operation infection predicting apparatus, post-operation infection predicting method and program
CN112037880A (en) * 2020-08-31 2020-12-04 康键信息技术(深圳)有限公司 Medication recommendation method, device, equipment and storage medium
CN112214613A (en) * 2020-10-15 2021-01-12 平安国际智慧城市科技股份有限公司 Artificial intelligence-based medication recommendation method and device, electronic equipment and medium
CN112489769A (en) * 2019-08-22 2021-03-12 浙江远图互联科技股份有限公司 Intelligent traditional Chinese medicine diagnosis and medicine recommendation system for chronic diseases based on deep neural network

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11899022B2 (en) * 2017-08-23 2024-02-13 The General Hospital Corporation Multiplexed proteomics and predictive drug candidate assessment
CN109087691A (en) * 2018-08-02 2018-12-25 科大智能机器人技术有限公司 A kind of OTC drugs recommender system and recommended method based on deep learning
CN109360604B (en) * 2018-11-21 2021-09-24 南昌大学 Ovarian cancer molecular typing prediction system
CN111613289B (en) * 2020-05-07 2023-04-28 浙江大学医学院附属第一医院 Individuation medicine dosage prediction method, device, electronic equipment and storage medium
CN112735535B (en) * 2021-04-01 2021-06-25 腾讯科技(深圳)有限公司 Prediction model training method, prediction model training device, data prediction method, data prediction device and storage medium
CN113284577B (en) * 2021-05-24 2023-08-11 康键信息技术(深圳)有限公司 Medicine prediction method, device, equipment and storage medium

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108597603A (en) * 2018-05-04 2018-09-28 吉林大学 Cancer return forecasting system based on Multi-dimensional Gaussian distribution Bayes's classification
JP2020021371A (en) * 2018-08-02 2020-02-06 Necソリューションイノベータ株式会社 Post-operation infection predicting apparatus, method of producing post-operation infection predicting apparatus, post-operation infection predicting method and program
CN112489769A (en) * 2019-08-22 2021-03-12 浙江远图互联科技股份有限公司 Intelligent traditional Chinese medicine diagnosis and medicine recommendation system for chronic diseases based on deep neural network
CN112037880A (en) * 2020-08-31 2020-12-04 康键信息技术(深圳)有限公司 Medication recommendation method, device, equipment and storage medium
CN112214613A (en) * 2020-10-15 2021-01-12 平安国际智慧城市科技股份有限公司 Artificial intelligence-based medication recommendation method and device, electronic equipment and medium

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
LEI ZHANG等: "Ancient terms of chronic renal failure: The key to ancient literature mining" *
陈静锋: "基于电子病历的典型诊疗模式挖掘方法研究" *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2022247549A1 (en) * 2021-05-24 2022-12-01 康键信息技术(深圳)有限公司 Drug prediction method, apparatus and device, and storage medium
CN113688329A (en) * 2021-08-25 2021-11-23 平安国际智慧城市科技股份有限公司 Information pushing method, device, equipment and storage medium based on medical service

Also Published As

Publication number Publication date
CN113284577B (en) 2023-08-11
WO2022247549A1 (en) 2022-12-01

Similar Documents

Publication Publication Date Title
CN107785058A (en) Anti- fraud recognition methods, storage medium and the server for carrying safety brain
CN113284577B (en) Medicine prediction method, device, equipment and storage medium
CN109920506B (en) Medical statistics report generation method, device, equipment and storage medium
CN108280149A (en) A kind of doctor-patient dispute class case recommendation method based on various dimensions tag along sort
WO2014201515A1 (en) Medical data processing for risk prediction
CN112017040B (en) Credit scoring model training method, scoring system, equipment and medium
CN113657548A (en) Medical insurance abnormity detection method and device, computer equipment and storage medium
CN113138982B (en) Big data cleaning method
US20130290197A1 (en) Patent power calculating device and method for operating patent power calculating device
US20140297317A1 (en) Extracting key action patterns from patient event data
CN108984708A (en) Dirty data recognition methods and device, data cleaning method and device, controller
US7805421B2 (en) Method and system for reducing a data set
CN110729054A (en) Abnormal diagnosis behavior detection method and device, computer equipment and storage medium
Kamal et al. Identifying Foreign Suppliers in US Merchandise Import Transactions
CN115148370A (en) Method and system for generating DIP disease category catalog
Khan et al. An improved pre-processing machine learning approach for cross-sectional mr imaging of demented older adults
CN113421116A (en) User recall analysis method, device, equipment and storage medium
CN113159118A (en) Logistics data index processing method, device, equipment and storage medium
Ostropolets et al. Phenotyping in distributed data networks: selecting the right codes for the right patients
CN116663978A (en) Quality assessment method and system for audit data
CN115391315A (en) Data cleaning method and device
CN112231420A (en) Data analysis method, data analysis device, electronic device, and storage medium
JP2022086803A (en) Method for estimating reason, method for prediction, method for estimating attribute value, reason estimation device, prediction device, attribute value estimation device, and program
CN111986815A (en) Project combination mining method based on co-occurrence relation and related equipment
CN117690549B (en) Traditional Chinese medicine individuation intelligent prescription medicine recommendation system based on similar patient matching

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant