CN113362116B - Medicine market scale prediction system based on machine learning - Google Patents

Medicine market scale prediction system based on machine learning Download PDF

Info

Publication number
CN113362116B
CN113362116B CN202110739439.6A CN202110739439A CN113362116B CN 113362116 B CN113362116 B CN 113362116B CN 202110739439 A CN202110739439 A CN 202110739439A CN 113362116 B CN113362116 B CN 113362116B
Authority
CN
China
Prior art keywords
data
model
medicine
prediction
purchase
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110739439.6A
Other languages
Chinese (zh)
Other versions
CN113362116A (en
Inventor
朱仁
卓绮雯
李晓彤
劳丽玫
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Quanyaowang Technology Co ltd
Original Assignee
Shenzhen Quanyaowang Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Quanyaowang Technology Co ltd filed Critical Shenzhen Quanyaowang Technology Co ltd
Priority to CN202110739439.6A priority Critical patent/CN113362116B/en
Publication of CN113362116A publication Critical patent/CN113362116A/en
Application granted granted Critical
Publication of CN113362116B publication Critical patent/CN113362116B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0201Market modelling; Market analysis; Collecting market data
    • G06Q30/0202Market predictions or forecasting for commercial activities
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Strategic Management (AREA)
  • Development Economics (AREA)
  • Theoretical Computer Science (AREA)
  • Finance (AREA)
  • Accounting & Taxation (AREA)
  • Data Mining & Analysis (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • Entrepreneurship & Innovation (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Mathematical Physics (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • Computing Systems (AREA)
  • Medical Informatics (AREA)
  • Game Theory and Decision Science (AREA)
  • Economics (AREA)
  • Marketing (AREA)
  • General Business, Economics & Management (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Medical Treatment And Welfare Office Work (AREA)

Abstract

The invention relates to the technical field of medical big data, in particular to a medicine market scale prediction system based on machine learning, which can predict the change trend of the medicine market purchase quantity, assist in making a purchase plan of a medicine market, help a user to know market positioning and assist in monitoring and evaluating reasonable medicine; the method comprises the following steps: s1, data requirement; s2, purchasing quantity related data; s3, data cleaning; s4, if the data are not included; s5, carrying out multi-dimensional statistics; s6, an index library; s7, randomly grouping; s8, evaluating the importance of the variable; s9, building a prediction model; s10, whether all variables are traversed or not; s11, model evaluation; s12, expert evaluation; s13, judging practicability; s14, testing a real environment; s15, whether to reevaluate.

Description

Medicine market scale prediction system based on machine learning
Technical Field
The invention relates to the technical field of medical big data, in particular to a medicine market scale prediction system based on machine learning.
Background
The medicine market scale has unknown and uncertain degree, the change trend of the purchase quantity is influenced by various factors such as medical institutions, medicine characteristics, medical insurance, market competition, medicine policies and the like, and the individual experience has a large limitation on the medicine purchase quantity prediction, and the method is specifically embodied in the following aspects: the medical institution lacks scientific and reasonable medicine market data support in formulating a new round of medicine purchasing plan and national collection medicine report, simply obtains a medicine purchasing quantity predicted value according to coefficient addition by virtue of past medicine purchasing data, and formulates a purchasing plan based on the predicted value, so that part of medicines are overstocked in a warehouse or are in medicine shortage, and the national collection task does not reach the standard and is interviewed or the performance evaluation does not reach the standard, so that the centralized purchasing incentive of the medicines is denied by a vote, and the reasonable utilization and distribution of medical resources are not facilitated; in the aspect of relevant policies of medicine markets such as medicine volume bargaining negotiation, medicine centralized purchasing scheme formulation, medicine centralized purchasing incentive scheme, medicine payment budget scheme and the like, relevant government departments mainly rely on data reported by regulatory units such as medical institutions and the like, medicine market data are lacked as an assistant decision tool to correct the bias of the existing data, chips of the medicine volume bargaining are weakened, and the possibility of phenomena such as centralized purchasing execution obstruction, inappropriate incentive measures and actual purchasing conditions, medical insurance fund waste and the like is increased; in the aspect of making enterprise plans such as a medicine production plan, a medicine sales plan, a business medical insurance risk assessment scheme, a market development strategy and the like, enterprise institutions mainly adopt cross-sectional medicine market data provided by third-party market assessment institutions as reference bases, ignore the longitudinal change characteristics of the medicine data, lack of medicine market data support combining globalization and refinement, and cause the problems of inaccurate medicine market positioning, lagged medicine yield, excessive capacity, increased uncertainty of business medical insurance risk and the like.
Disclosure of Invention
In order to solve the technical problems, the invention provides a medicine market scale prediction system based on machine learning, which can predict the change trend of the medicine market purchase quantity, assist in making a purchase plan of a medicine market, help a user to know market positioning and assist in monitoring and evaluating reasonable medicines.
The invention relates to a medicine market scale prediction system based on machine learning, which comprises the following steps:
s1, data requirement: according to the demand of the user in the aspect of medicine market scale prediction, if historical modeling experience exists, the historical modeling experience and a problem solution are combined to integrate and form comprehensive data demand;
s2, data related to purchase quantity: according to the data demand in the aspect of medicine market scale prediction, relevant data of the purchase amount are called from a medicine transaction database and stored in a structured standard data table;
s3, data cleaning: marking the data and bringing the effective data into a model;
s4, data inclusion is not: data inspection is carried out on data which are not included in the model, the reason that the data do not meet the standard of the included model is deeply found, and possible data problems are mined; carrying out multi-dimensional statistics on the data incorporated into the model;
s5, carrying out multi-dimensional statistics: carrying out multi-dimensional statistics on data incorporated into the model from the aspects of drug attributes, hospital attributes, market competition, sales price and the like;
s6, index library: structuring the multi-dimensional statistical results of each region and storing the results in an index standard database;
s7, random grouping: according to a certain distribution proportion, according to the unique code of the medical institution, dividing data of part of the medical institution into a test set, and dividing data of the rest of the medical institutions into a training set for model fitting;
s8, evaluating the importance of the variables: evaluating the importance of all independent variables by adopting a random forest model, and adopting a mean square error increment rate (% IncMSE) as an importance evaluation index of the independent variables for predicting the regression problem of the medicine purchase quantity; for the classification problem of predicting the multiplying power rating of the purchase quantity, evaluating the importance of the independent variable by adopting Mean increment Accuracy (MDA);
s9, building a prediction model;
s10, whether all variables are traversed: checking whether the circulation passes all independent variables in the training set, and if not, continuing the circulation process; if all independent variables are passed, ending the circulation, and screening an optimal model in the prediction model set according to the goodness of fit or accuracy;
s11, model evaluation;
s12, expert evaluation: the experts in the related field evaluate and analyze the prediction result of the model according to the related experience and the reference data, provide modification suggestions and evaluate the practicability of the modification suggestions;
s13, judging practicability: when the prediction model does not reach the practical stage, returning to the data requirement generation stage according to the modification suggestion and the evaluation result, and guiding the next model building scheme; when the prediction model reaches the practical stage, storing the prediction model in a prediction model database;
s14, testing a real environment;
s15, whether to reevaluate: judging whether the model needs to be reevaluated according to the real environment test result, returning to the model evaluation stage if reevaluation is needed, modifying the modeling scheme according to the expert evaluation result, and reentering the next modeling stage; if no re-evaluation is required, the predictive model is incorporated into the drug transaction monitoring system.
Further, the step S3 includes the following steps:
1) invalid data such as invalid order data, unknown source data, error data and the like in the data to be cleaned are subjected to invalidation marking;
2) the method comprises the following steps of associating a built drug information standard library by utilizing drug codes, and marking attributes of universal names of catalogs, dosage forms of catalogs, standard specifications, names of standard manufacturers, basic drugs, medical insurance, limited daily doses and the like of drugs;
3) associating the established medical institution information standard library by utilizing hospital codes, and marking the attributes of the medical institution such as grade rating, administrative region, basic level classification and the like;
4) setting a missing value supplement rule for all necessary fields, updating and perfecting the supplement rule along with the feedback of the problems found in the modeling process, and supplementing the missing values of the data to be cleaned by combining the supplement rule;
5) in combination with the inclusion criteria, for invalid data, data that cannot be supplemented by necessary fields, data that is not within a statistical time range, or other data that the modeling experience deems to be excluded, etc., the data is marked as data that is not included in the model, and the other data is marked as data that is included in the model.
Further, the step S9 includes the following steps:
1) combining the importance evaluation result of the independent variables, sorting the independent variables in descending order according to the importance indexes to obtain an independent variable set F { x1, x2, x3, … }, and sequentially taking the first i-bit elements in F in the circulation process to ensure that the first i-bit elements in F are in turn used for obtaining the importance evaluation result of the independent variables
Figure BDA0003140896570000041
In each circulation, all elements in fi are taken as independent variable combinations of the building model;
2) for the regression problem of predicting the medicine purchase quantity, a random forest regression device and a ridge regression model are adopted to carry out regression analysis, and meanwhile, when time series data are complete, a time series analysis method is adopted to optimize the model; for the classification problem of the prediction purchase quantity multiplying power rating, a random forest classifier model is adopted for carrying out cluster analysis;
3) and adjusting parameters in a part of the model, evaluating the goodness-of-fit or accuracy of the model and outputting a prediction model.
Further, the step S11 includes the following steps:
1) obtaining a prediction result of the test set by using the screened model and combining the test set data;
2) for regression analysis, performing consistency evaluation by adopting a Bland-Altman method, and comparing a difference value with an acceptable error threshold value, wherein the acceptable error threshold value is provided by user requirements; for clustering analysis, performing consistency evaluation by adopting ten-fold cross validation and a confusion matrix, and comparing accuracy with an accuracy threshold value, wherein the accuracy threshold value is provided by user requirements;
3) and (4) carrying out comparative analysis on the extrapolation of the two models, analyzing the advantages and the disadvantages of the two models, and forming a model evaluation result.
Further, the step S14 includes the following steps:
1) forecasting the dosage of the medicines in the new round of purchase cycle by utilizing a forecasting model and combining the medicine catalog information of the new round of medicine purchase cycle and a medicine transaction database;
2) for the prediction result of the medicine purchasing quantity, certain personalized adjustment can be properly carried out according to the purchasing behavior data of the user and the requirement of the user;
3) displaying the prediction result to a user, and the user puts forward a modification demand according to the experience of the user and updates the user demand;
4) when the user has no further modification requirement, storing the adjusted adaptive model in a model database;
5) and when the execution of a new round of medicine purchasing period is finished, comparing the difference between the predicted value and the true value, and implementing deviation analysis to obtain a model reevaluation conclusion and a modification suggestion.
Further, in step S5, the drug attributes include a base drug classification, a medical insurance classification, an ATC group purchase amount, a medication route, and the like; the hospital attributes comprise medical institution grade rating, medicine purchasing scale, basic level classification, administrative region and the like; the market competition comprises the number of competitive enterprises, the market share of imported enterprises, the number of over-consistency rating enterprises, the number of hundreds of enterprises ranked in the Ministry of industry and trust, the market share of the hundreds of enterprises and the like; and the sales price of the sales volume comprises the purchase volume and the purchase amount of the medicine in the previous purchase period, if the time sequence of the data is complete, the trend increase rate of the purchase volume of the medicine, the weighted average, the standard deviation, the range, the median, the maximum value, the minimum value and the like of the medicine price, the rate rating of the purchase volume and the purchase volume multiplying power in the target purchase period are analyzed and counted.
Compared with the prior art, the invention has the beneficial effects that:
predicting the change trend of the purchase quantity of the medicine market: the method can reveal the change trend of the medicine market purchase quantity within a certain time, and provides theoretical support for aspects of medicine market supervision, medicine production, medicine sale, medicine purchase, medicine economics research and the like through the demonstration of medicine purchase data.
And (3) assisting to establish a purchasing plan of a medicine market: the system can assist the medical institution to formulate a reasonable medicine purchasing plan, guide the work of reporting the purchasing quantity of the alliance collected medicines of the medical institution, and avoid the problems that a large amount of medicines are left behind and scrapped due to overhigh reporting quantity, and the supporting force of the alliance collected matching incentive policy is reduced due to overlow reporting quantity, so that the standardization and rationalization of medicine purchasing of the medical institution are promoted; the invention can also assist relevant administrative departments to formulate a alliance drug collection scheme, improve the reasonability of drug volume purchase, and simultaneously assist relevant administrative departments to formulate more refined medical insurance budget, thereby being beneficial to landing and implementing the DRGs payment in the medical insurance payment mode.
Help users to know market positioning: the invention can help the drug production enterprises to know the drug market demand, help the production enterprises to make production plans from the aspects of drug attributes, time dimension and the like, reasonably distribute production data, avoid the problems of drug yield lag, excess capacity and the like, and improve the sensitivity of drug production to the change of drug demand; the invention can also help drug dealer enterprises to know the drug market positioning, formulate scientific, reasonable and refined drug market deployment strategy, improve the dynamic management level of drug warehouses, and avoid the problems of drug sales and demand disjunction, drug market opportunity loss and the like; the invention can also help the business medical insurance enterprises to evaluate the drug market risk, promote the fine management of the drug business insurance scheme and reduce the risk brought by the unknown and uncertain changes of the drug market.
Monitoring and evaluating auxiliary rational medication: the method can assist users of medical institutions, related supervision departments and the like in the assessment of the intervention measures in the aspect of reasonable medication, and reduce the change trend of the quantity of the monitored medicines under the condition of no intervention measures, so that the interference of other influence factors in the assessment process of the intervention measures is eliminated, the accuracy and the rigor of the assessment scheme of the reasonable medication intervention measures are improved, and the actual effect of the intervention measures is prevented from being exaggerated or underestimated.
Drawings
FIG. 1 is a general flow chart of a drug market size forecasting system;
FIG. 2 is a flow diagram of a drug market size forecasting system;
FIG. 3 is a sub-flow diagram of data cleansing;
FIG. 4 is a flow chart of predictive model building;
FIG. 5 is a flow chart of model evaluation;
FIG. 6 is a flow chart of a real environment test;
FIG. 7 is a schematic diagram of regression model independent variable importance evaluation;
FIG. 8 is a schematic illustration of a classification model independent variable importance evaluation;
FIG. 9 is a graph of regressor decision tree number versus error;
FIG. 10 is a graph of regressor node values versus out-of-bag error;
FIG. 11 is a graph of classifier decision tree number versus error;
FIG. 12 is a graph of classifier node values versus out-of-bag errors;
FIG. 13 is a graph of ridge regression nPC values versus coefficients;
FIG. 14 is a graph of the number of regressor arguments versus RMSE;
FIG. 15 is a graph of the number of classifier arguments versus accuracy;
FIG. 16 is a graph of the effects of a regressor fit;
FIG. 17 is a plot of regressor Bland-Altman consistency assessment;
FIG. 18 is a multi-dimensional scale analysis diagram of a classifier;
Detailed Description
The following detailed description of embodiments of the present invention is provided in connection with the accompanying drawings and examples. The following examples are intended to illustrate the invention but are not intended to limit the scope of the invention.
Example (b):
taking city A as an example, a research sample is related data collected and expanded by a first batch of countries and collected by a second batch of countries in the area, and the specific examples are as follows:
1. data requirements: according to the requirement of the user on the aspect of national collection and reporting, the detailed data of the medical institution drug purchase order in the city A containing the contents of medical institution codes, drug codes, order time, order quantity, order amount and the like needs to be called to generate the data requirement.
2. Reporting related data: and calling detail data of the medicine purchase orders of the medical institutions in the urban area A from the medicine transaction database according to the data requirements in the aspect of national collection and reporting.
3. Data cleaning:
1) invalid data of zero order purchase quantity, error order verification of medical institutions, inaccurate medicine information of offline purchase sources, no medicine codes or hospital codes or purchase time and the like in the data to be cleaned in the urban area A are subjected to invalidation marking;
2) associating the established information standard library by using the drug code and the hospital code, and marking the drug attribute and the hospital attribute;
3) according to the supplement rule, for fields such as medical institution rating, drug base medical insurance category, consistency evaluation over-rating enterprise number and the like, supplementing the missing value of the qualitative data to other fields, and supplementing the missing value of the quantitative data to 0 fields;
4) and marking invalid data, data which are not in the time range of one year before the beginning of each collection batch and purchasing data with an excessively small purchasing quantity in the data of the city area A as data which are not included in the model, and marking other data as data which are included in the model.
4. Data inclusion no: data inspection is carried out on the data of the city area A which is not included in the model, and the screened data are all in a non-statistical time range, so that the data quality problem does not exist; and carrying out next multi-dimensional statistics on the data included in the model.
5. Carrying out multi-dimensional statistics: counting the data of the region A city data which is included in the model from the aspects of medicine attribute, hospital attribute, market competition, sales price and the like; the drug attributes comprise basic drug classification, medical insurance classification, ATC group purchase amount, medication route and the like; the hospital attributes comprise grade rating of medical institutions, medicine purchasing scale, basic level classification, administrative regions and the like; the market competition comprises the number of competitive enterprises, the market share of imported enterprises, the number of over-consistency rating enterprises, the number of hundreds of enterprises ranked in the Ministry of industry and trust, the market share of the hundreds of enterprises and the like; the sales price comprises the purchase amount and purchase amount of the medicine before collection, the trend growth rate of the purchase amount of the medicine, the weighted average, standard deviation, range, median, maximum value, minimum value and the like of the medicine price before collection, and the rate rating of the purchase amount and the purchase amount multiplying power during the national collection execution period.
6. An index library: and structuring the result of the multi-dimensional statistics of the urban area A, and storing the result in an index standard database, wherein the number of the urban area A data is 1453.
7. And (3) random grouping: generating a random number by using system time, and enabling index data of the urban area A to be in a range of 0.8: the training set and the test set are allocated according to the proportion of 0.2, 276 pieces of medicine purchasing data of 27 medical institutions are randomly extracted to be used as the test set according to the unique codes of the medical institutions, and 1177 pieces of medicine purchasing data of the remaining 128 medical institutions are divided into the training set and used for model fitting.
8. Evaluation of variable importance:
1) when the dependent variable is the acquired medicine purchase quantity in the urban area A, all independent variables are included to initially construct a random forest regression model, a Mean Square Error (MSE) is used as an evaluation index of a random forest regression, the contribution degree of the independent variable to MSE reduction is embodied as a mean square error increment rate (% IncMSE), and% IncMSE is used as an importance evaluation index of the independent variable, and the specific formula is as follows:
Figure BDA0003140896570000091
Figure BDA0003140896570000092
where MSE represents the mean square error of the model, n represents the number of samples, i represents the number of samples, yiThe actual procurement amount of the collected medicines is shown,
Figure BDA0003140896570000093
representing the predicted collected drug purchase amount,% IncMSEi representing the mean square error increase rate of the ith sample, Δ MSEiRepresenting the mean square error increment when the original content of the ith sample is replaced by a random value;
obtaining an independent variable importance evaluation result according to the preliminarily constructed A urban area random forest regression model (see figure 7);
2) when the dependent variable is the purchase quantity multiplying power rating of the urban area A, all independent variables are included to initially construct a random forest classification model, the accuracy is used as an evaluation index of a random forest classifier, and the importance of the independent variable is evaluated by adopting average accuracy descending (MDA);
obtaining an independent variable importance evaluation result according to the preliminarily constructed A urban area random forest classification model (see figure 8);
9. building a prediction model:
1) combining the importance evaluation result of the independent variables of the training set data of the urban area A, sorting the independent variables in a descending order according to importance indexes to obtain an independent variable set F { pre-collection purchase amount, trend growth rate, … }, setting an iteration number i as 1 in a circulation process, sequentially taking the first i bit elements in the F to obtain an independent variable set subset fi, and taking all the elements in the fi as independent variable combinations of the building model in each circulation;
2) for the regression problem of predicting the medicine purchase quantity, a random forest regressor and a ridge regression model are adopted for carrying out regression analysis; for the classification problem of the prediction purchase quantity multiplying power rating, a random forest classifier model is adopted for carrying out cluster analysis;
3) the random forest model needs to adjust the number value ntree of parameter decision trees and the number (node value) mtry of feature selection, the ridge regression model needs to adjust the value (nPC) of parameter k, and the parameter adjustment cases of the urban area A in the circulation process are as follows:
in the random forest regression model, the error value decreases with the increase of the number of decision trees, (fig. 9) shows that the error value of the model is basically stable when the number of decision trees is 800, (fig. 10) shows that the error value outside the bag is smaller when the number of feature choices is 6, and in the random forest classification model, (fig. 11) shows that the error value of the model is basically stable when the number of decision trees is 1000, (fig. 12) shows that the error value outside the bag is smaller when the number of feature choices is 3;
in the ridge regression model, (fig. 13) shows that when the k value (nPC) is 9, the coefficient of each variable is substantially stable, and the k value satisfies the minimum value of the following formula condition.
Figure BDA0003140896570000101
4) For random forest regression models and ridge regression models, goodness of fit statistic R2And the root mean square error RMSE calculation formula is as follows.
Figure BDA0003140896570000102
Figure BDA0003140896570000103
For the random forest classification model, the Accuracy and kappa value calculation formula is as follows.
Figure BDA0003140896570000104
Figure BDA0003140896570000105
10. Whether all variables are traversed: setting I as the total number of independent variables of the training set, and when I is less than I, making I equal to I +1, and entering a step of reconstructing a model; otherwise, respectively screening the prediction models from the regressor and the classifier, and screening the regressor with the minimum RMSE and the classifier with the maximum accuracy as the optimal prediction model.
In the regression model, (fig. 14) shows that when i is 18, the constructed random forest regressor RMSE is minimal.
In the classification model, (fig. 15) shows that when i < 19, the accuracy of the constructed random forest classifier is kept at a high level, and the independent variable which leads to the accuracy reduction after the model is introduced is removed by adopting a forward selection method.
11. And (3) model evaluation:
1) obtaining a prediction result of the test set by utilizing the screened random forest regressor and random forest classifier prediction models and combining the test set data of the urban area A;
2) for a random forest regressor, the test results show goodness-of-fit statistics R2The fitting effect is 0.744, the fitting effect is shown (fig. 16), the RMSE is 57103.93, the Bland-Altman consistency evaluation shows that the mean difference value is 2790.21, and (fig. 17) shows that the difference values of the predicted value and the actual value are more concentrated, the predicted value is higher than the actual value on the whole, and an extreme value with larger prediction error exists; for the random forest classifier, the test result shows that the accuracy is 54.4%, the 95% confidence interval is (0.495,0.594), the P is 0.002, the kappa coefficient is 0.221, the paired chi-square test shows that the P is 0.058, the prediction error of the three types of purchase quantity multiplying power ratings of 'below 1 time', '1-3 times' and 'above 3 times' is the largest, and the classification error rate is that72.5%, the classification error rate of "1-3 times" was 36.0%, the classification error rate of "1 times or less" was 40.8%, and the multidimensional scaling analysis results (FIG. 18) showed that the similarity of the three classes of packets was relatively close;
3) in the two models established based on the urban area A data, the extrapolation performance of the classifier is weak, and the extrapolation performance of the regressor is strong, so that the two models are not considered to be integrated for advantage complementation.
12. And (4) expert evaluation: and (3) evaluating and analyzing the prediction result of the model by clinical pharmacy experts and pharmaceutical economics experts according to related experience and reference data, providing modification suggestions on the aspects of the rationality of independent variables, the influence of potential variables, the defects of a modeling method and the like of the included model, and evaluating the practicability of the model.
13. And (3) judging the practicability: when the prediction model does not reach the practical stage, returning to the data demand generation stage according to the modification suggestion and the evaluation result, considering appropriate increase and decrease of independent variables, change of other prediction models and combination models, and entering the next report prediction model building cycle; when the prediction model reaches the practical stage, the prediction model is stored in the prediction model database.
14. And (3) testing a real environment: and taking the report of the city A region where the country collects the fifth batch of medicines as a test environment of the existing model, cooperating with the user, and implementing refined and personalized management on the prediction model to a certain extent based on the medicine purchasing behavior related data such as the medicine purchasing budget of the user, and evaluating the deviation between the predicted value and the actual value when the country collects the fifth batch of medicines.
The prediction result of the fifth batch of medicine report collected by the country in the urban area A is as follows: the promethazine oral sustained-release dosage form predicts a procurement amount of 33282 and a predicted increase rate of-16.79% for hospital a, and predicts a procurement amount of 5090 and a predicted increase rate of 3.89% for hospital B; the propranolol oral sustained-release preparation has the pre-collection purchase amount of 9062 and the predicted purchase amount of 14611 and the predicted increase rate of 61.23% in hospital C, and has the pre-collection purchase amount of 937, the predicted purchase amount of 14611 and the predicted increase rate of 245.88% in hospital D.
15. Whether to reevaluate: judging whether the model needs to be reevaluated according to the report forecasting deviation result of the fifth batch of medicines collected by the country, returning to the model evaluation stage if the model needs to be reevaluated, confirming the source of the forecasting deviation by an expert according to the real environment test result, proposing a perfecting scheme, and reentering the next modeling stage; and if no re-evaluation is needed, the prediction model is incorporated into the collection monitoring system.
The above description is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, several modifications and variations can be made without departing from the technical principle of the present invention, and these modifications and variations should also be regarded as the protection scope of the present invention.

Claims (5)

1. A drug market size forecasting system based on machine learning, comprising the steps of:
s1, data requirement: according to the demand of the user in the aspect of medicine market scale prediction, if historical modeling experience exists, the historical modeling experience and a problem solution are combined to integrate and form comprehensive data demand;
s2, data related to purchase quantity: according to the data requirements in the aspect of medicine market scale prediction, related data of the report quantity are called from a medicine transaction database and stored in a structured standard data table;
s3, data cleaning: marking the data and bringing the effective data into a model;
s4, data inclusion is not: data inspection is carried out on data which are not included in the model, the reason that the data do not meet the standard of the included model is deeply found, and possible data problems are mined; carrying out multi-dimensional statistics on the data incorporated into the model;
s5, carrying out multi-dimensional statistics: carrying out multi-dimensional statistics on data incorporated into the model from the aspects of drug attributes, hospital attributes, market competition, sales price and the like;
s6, index library: structuring the multi-dimensional statistical results of each region and storing the results in an index standard database;
s7, random grouping: according to a certain distribution proportion, according to the unique code of the medical institution, dividing data of part of the medical institution into a test set, and dividing data of the rest of the medical institutions into a training set for model fitting;
s8, evaluating the importance of the variables: evaluating the importance of all independent variables by adopting a random forest model, and adopting a mean square error increment rate (% IncMSE) as an importance evaluation index of the independent variables for predicting the regression problem of the medicine purchase quantity; for the classification problem of predicting the multiplying power rating of the purchase quantity, evaluating the importance of the independent variable by adopting Mean increment Accuracy (MDA);
s9, building a prediction model, comprising the following steps:
1) combining the importance evaluation results of the independent variables in S8, sorting the independent variables in descending order according to the importance indexes to obtain an independent variable set F { x1, x2, x3, … }, and sequentially taking the first i-bit elements in F in the circulation process to enable the first i-bit elements to be in the set F to be in the order of the first i-bit elements in the set F to be in the order of the second i-bit elements in the set F to be in the order of the first i-bit elements to be in the set F to be in the second i-bit elements to be in the set F to be in the order to be in the second i-bit elements
Figure FDA0003585001310000011
In each circulation, all elements in fi are taken as independent variable combinations of the building model;
2) for the regression problem of predicting the medicine purchase quantity, a random forest regression device and a ridge regression model are adopted to carry out regression analysis, and meanwhile, when time series data are complete, a time series analysis method is adopted to optimize the model; for the classification problem of the prediction purchase quantity multiplying power rating, a random forest classifier model is adopted for carrying out cluster analysis;
3) adjusting parameters in a partial model, evaluating the goodness-of-fit or accuracy of the model, and outputting a prediction model;
s10, whether all variables are traversed: checking whether the circulation passes all independent variables in the training set, and if not, continuing the circulation process; if all independent variables are passed, ending the circulation, and screening an optimal model in the prediction model set according to the goodness of fit or accuracy;
s11, model evaluation;
s12, expert evaluation: the experts in the related field evaluate and analyze the prediction result of the model according to the related experience and the reference data, provide modification suggestions and evaluate the practicability of the modification suggestions;
s13, judging practicability: when the prediction model does not reach the practical stage, returning to the data requirement generation stage according to the modification suggestion and the evaluation result, and guiding the next model building scheme; when the prediction model reaches the practical stage, storing the prediction model in a prediction model database;
s14, testing a real environment;
s15, whether to reevaluate: judging whether the model needs to be reevaluated according to the real environment test result, returning to the model evaluation stage if reevaluation is needed, modifying the modeling scheme according to the expert evaluation result, and reentering the next modeling stage; if no re-evaluation is required, the predictive model is incorporated into the drug transaction monitoring system.
2. The machine learning-based pharmaceutical market size prediction system of claim 1, wherein the step S3 comprises the steps of:
1) invalid data such as invalid order data, unknown source data, error data and the like in the data to be cleaned are subjected to invalidation marking;
2) the method comprises the following steps of associating a built drug information standard library by utilizing drug codes, and marking attributes of universal names of catalogs, dosage forms of catalogs, standard specifications, names of standard manufacturers, basic drugs, medical insurance, limited daily doses and the like of drugs;
3) associating the established medical institution information standard library by utilizing hospital codes, and marking the attributes of the medical institution such as grade rating, administrative region, basic level classification and the like;
4) setting a missing value supplement rule for all necessary fields, updating and perfecting the supplement rule along with the feedback of the problems found in the modeling process, and supplementing the missing values of the data to be cleaned by combining the supplement rule;
5) in combination with the inclusion criteria, for invalid data, data that cannot be supplemented by necessary fields, data that is not within a statistical time range, or other data that the modeling experience deems to be excluded, etc., the data is marked as data that is not included in the model, and the other data is marked as data that is included in the model.
3. The machine learning-based pharmaceutical market size prediction system of claim 2, wherein the step S11 comprises the steps of:
1) obtaining a prediction result of the test set by using the screened model and combining the test set data;
2) for regression analysis, performing consistency evaluation by adopting a Bland-Altman method, and comparing a difference value with an acceptable error threshold value, wherein the acceptable error threshold value is provided by user requirements; for cluster analysis, performing consistency evaluation by adopting ten-fold cross validation and a confusion matrix, and comparing accuracy with an accuracy threshold value, wherein the accuracy threshold value is provided by user requirements;
3) and (4) carrying out comparative analysis on the extrapolation of the two models, analyzing the advantages and the disadvantages of the two models, and forming a model evaluation result.
4. The machine learning-based pharmaceutical market size prediction system of claim 3, wherein the step S14 comprises the steps of:
1) forecasting the dosage of the medicines in the new round of purchasing period by utilizing a forecasting model and combining the medicine catalog information of the new round of medicine purchasing period and a medicine transaction database;
2) for the prediction result of the medicine purchasing quantity, certain personalized adjustment can be properly carried out according to the purchasing behavior data of the user and the requirement of the user;
3) displaying the prediction result to a user, and the user puts forward a modification demand according to the experience of the user and updates the user demand;
4) when the user has no further modification requirement, storing the adjusted adaptive model in a model database;
5) and when the execution of a new round of medicine purchasing period is finished, comparing the difference between the predicted value and the true value, and implementing deviation analysis to obtain a model reevaluation conclusion and a modification suggestion.
5. The machine learning-based drug market size forecasting system of claim 4, wherein in step S5, the drug attribute aspects include drug base classification, medical insurance classification, ATC group purchase amount, route of medication, etc.; the hospital attributes comprise medical institution grade rating, medicine purchasing scale, basic level classification, administrative region and the like; the market competition comprises the number of competitive enterprises, the market share of imported enterprises, the number of over-consistency rating enterprises, the number of hundred enterprises which are ranked at the Ministry of industry and trust and the market share thereof, and the like; and the sales price of the sales volume comprises the purchase volume and the purchase amount of the medicine in the previous purchase period, if the time sequence of the data is complete, the trend growth rate of the purchase volume of the medicine, the weighted average, the standard deviation, the range, the median, the maximum value, the minimum value and the like of the medicine price are analyzed and counted, and the rate rating of the purchase volume and the purchase volume of the medicine in the target purchase period is carried out.
CN202110739439.6A 2021-06-30 2021-06-30 Medicine market scale prediction system based on machine learning Active CN113362116B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110739439.6A CN113362116B (en) 2021-06-30 2021-06-30 Medicine market scale prediction system based on machine learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110739439.6A CN113362116B (en) 2021-06-30 2021-06-30 Medicine market scale prediction system based on machine learning

Publications (2)

Publication Number Publication Date
CN113362116A CN113362116A (en) 2021-09-07
CN113362116B true CN113362116B (en) 2022-05-27

Family

ID=77537580

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110739439.6A Active CN113362116B (en) 2021-06-30 2021-06-30 Medicine market scale prediction system based on machine learning

Country Status (1)

Country Link
CN (1) CN113362116B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114767072B (en) * 2022-06-24 2022-09-06 北京惠每云科技有限公司 Method and device for determining consistency of curative effects of medicines, electronic equipment and storage medium
CN115048874B (en) * 2022-08-16 2023-01-24 北京航空航天大学 Aircraft design parameter estimation method based on machine learning

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104063744A (en) * 2014-04-15 2014-09-24 浙江大学 Time series prediction method for medicine consumption
CN106355208A (en) * 2016-08-31 2017-01-25 广州精点计算机科技有限公司 Data prediction analysis method based on COX model and random survival forest
CN113034104A (en) * 2021-03-24 2021-06-25 深圳市全药网科技有限公司 Intelligent big data analysis method and system for medicine group purchasing

Also Published As

Publication number Publication date
CN113362116A (en) 2021-09-07

Similar Documents

Publication Publication Date Title
Pamučar et al. The selection of transport and handling resources in logistics centers using Multi-Attributive Border Approximation area Comparison (MABAC)
CN113362116B (en) Medicine market scale prediction system based on machine learning
Ayağ A hybrid approach to machine-tool selection through AHP and simulation
Türegün Financial performance evaluation by multi-criteria decision-making techniques
Tai et al. A grey decision and prediction model for investment in the core competitiveness of product development
Chen et al. Extracting performance rules of suppliers in the manufacturing industry: an empirical study
CN108389069A (en) Top-tier customer recognition methods based on random forest and logistic regression and device
Liu et al. Patent analysis and classification prediction of biomedicine industry: SOM-KPCA-SVM model
KR101625124B1 (en) The Technology Valuation Model Using Quantitative Patent Analysis
Sohrabi et al. A predictive analytics of physicians prescription and pharmacies sales correlation using data mining
CN113361961A (en) PDCA closed-loop management-based national collection full-process monitoring system and intelligent system implementation
CN112256681A (en) Air traffic control digital index application system and method
Theron The use of data mining for predicting injuries in professional football players
Hamoud et al. Design and implementing cancer data warehouse to support clinical decisions
Ibrahim et al. LRFM model analysis for customer segmentation using K-means clustering
Andria et al. Prediction Model of Health Insurance Membership for Informal Workers
CN115511408A (en) Medicine centralized purchasing monitoring and early warning visual platform and monitoring and early warning method thereof
Pelissari et al. A multiple-criteria decision sorting model for pharmaceutical suppliers classification under multiple uncertainties
CN109886288A (en) A kind of method for evaluating state and device for power transformer
Song Application of data mining technology in the CRM of pharmaceutical industry
JP3452308B2 (en) Data analyzer
CN113641659B (en) Medical characteristic database construction method, device, equipment and storage medium
Apitzsch et al. Cluster Analysis of Mixed Data Types in Credit Risk: A study of clustering algorithms to detect customer segments
Rabiha et al. Consumer segmentation using case based reasoning approach to printing company
Saarenmaa Assessing and measuring data quality in credit risk modelling

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant