CN111681775A - Medicine application analysis method, system and device based on medicine big data - Google Patents

Medicine application analysis method, system and device based on medicine big data Download PDF

Info

Publication number
CN111681775A
CN111681775A CN202010495118.1A CN202010495118A CN111681775A CN 111681775 A CN111681775 A CN 111681775A CN 202010495118 A CN202010495118 A CN 202010495118A CN 111681775 A CN111681775 A CN 111681775A
Authority
CN
China
Prior art keywords
data
attribute data
word
label
class
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010495118.1A
Other languages
Chinese (zh)
Other versions
CN111681775B (en
Inventor
沈灵仙
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Qiyun Digital Technology Co ltd
Original Assignee
Beijing Qiyun Digital Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Qiyun Digital Technology Co ltd filed Critical Beijing Qiyun Digital Technology Co ltd
Priority to CN202010495118.1A priority Critical patent/CN111681775B/en
Publication of CN111681775A publication Critical patent/CN111681775A/en
Application granted granted Critical
Publication of CN111681775B publication Critical patent/CN111681775B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H70/00ICT specially adapted for the handling or processing of medical references
    • G16H70/40ICT specially adapted for the handling or processing of medical references relating to drugs, e.g. their side effects or intended usage
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/29Graphical models, e.g. Bayesian networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks

Abstract

The invention relates to the technical field of big data processing, and particularly provides a method, a system and a device for medicine application analysis based on medicine big data, aiming at solving the problem of more accurately performing medicine application analysis based on massive medicine big data. Firstly, processing collected internal data and external data related to medicines to form tagged data, wherein the tagged data comprises objects corresponding to each tag type, object tags of each object and associated attribute data; then, obtaining a category label and a recommendation label of each attribute data and each attribute data identification interword association relation of one or more objects through big data analysis; and finally, the medicine application state is comprehensively analyzed from multiple dimensions such as medicine value and medicine recommendation according to the big data analysis result, and the problem that the accuracy of medicine application analysis is low due to the fact that the medicine application analysis is only performed according to data such as medicine supply quantity and supply area in the prior art is solved.

Description

Medicine application analysis method, system and device based on medicine big data
Technical Field
The invention relates to the technical field of big data processing, in particular to a method, a system and a device for medicine application analysis based on medicine big data.
Background
At present, the traditional medicine application analysis mainly acquires data such as the supply quantity variation trend of the medicine, the supply area and the like, and further analyzes the application state of the medicine according to the data. However, in addition to the data such as the supply amount of the medicine and the supply area, the feedback information of the medicine user is also an important factor for analyzing the medicine application, and if the medicine application is analyzed only according to the data such as the supply amount of the medicine and the supply area, the accuracy of the medicine application analysis is significantly reduced.
Accordingly, there is a need in the art for a new drug application assay protocol that addresses the above-mentioned problems.
Disclosure of Invention
In order to overcome the above-mentioned drawbacks, the present invention is proposed to provide a method, a system and an apparatus for drug application analysis based on medical big data that solve or at least partially solve the problem of how to more accurately perform drug application analysis based on a large amount of medical big data.
In a first aspect, a method for drug application analysis based on medical big data is provided, the method comprising:
acquiring internal data and external data and respectively processing the acquired data to form different types of tagged data;
classifying the attribute data of the one or more objects according to a classification model algorithm based on the tagged data of one or more different types and the attribute data of the one or more corresponding objects respectively to determine category labels of the attribute data, predicting recommendation probability of the attribute data of the one or more objects according to a neural network prediction model algorithm and outputting a prediction result, identifying an association relation between words for the attribute data of the one or more objects according to a vocabulary association analysis model algorithm, and generating abstract information for the attribute data of the one or more objects according to an abstract algorithm model;
and performing data merging processing after calculating through a business rule according to the determined category label, the prediction result of the recommendation probability, the association relation and the abstract information, and extracting the data to a corresponding medicine application state analysis end so as to respond to a user request and output a corresponding analysis result.
In a technical solution of the method for analyzing drug application based on medical big data, the step of "processing collected data respectively to form different types of labeled data" includes:
acquiring an object corresponding to each label type in the acquired data according to a preset label type, and setting a corresponding object label for each object;
acquiring attribute data associated with each object in the acquired data;
and respectively setting the label of the attribute data associated with each object according to the object label corresponding to each object, and acquiring the labeling data corresponding to each label type according to the object corresponding to each label type, the object label of each object and the associated attribute data.
In one embodiment of the method for analyzing a pharmaceutical application based on medical big data, the step of classifying the attribute data of the one or more subjects according to a classification model algorithm to determine the class label of each attribute data specifically includes:
acquiring data samples in a preset training set, wherein the data samples comprise attribute samples and corresponding class labels; wherein the category label comprises a drug name and a corresponding one or more indications;
performing model training on a pre-constructed naive Bayes classifier based on the data sample by utilizing a machine learning algorithm;
classifying each attribute data by using a naive Bayes classifier after model training to obtain a class label corresponding to each attribute data;
and/or the like and/or,
the step of predicting the recommendation probability of each attribute data of the one or more objects according to the neural network prediction model algorithm and outputting the prediction result specifically includes:
analyzing a recommendation category and a recommendation category weight corresponding to each word in a preset corpus according to preset seed words and corresponding recommendation categories and recommendation category weights;
constructing a word vector model according to the word vector, the recommendation category and the recommendation category weight corresponding to each word;
obtaining a word vector and a recommended category weight corresponding to each word in the attribute data by using the word vector model, and obtaining a feature vector corresponding to the attribute data according to the word vector and the recommended category weight corresponding to each word;
predicting the probability of each recommended category corresponding to the attribute data by using an LSTM model according to the feature vector corresponding to the attribute data;
setting a recommendation label of the attribute data according to a recommendation category corresponding to the maximum probability;
and/or the like and/or,
the step of identifying the association relationship among the words for each attribute data of the one or more objects according to the vocabulary association analysis model algorithm specifically includes:
screening the attribute data according to preset screening conditions to obtain target attribute data;
performing word segmentation processing on the target attribute data, and performing semantic analysis on each word in the target attribute data according to a word segmentation processing result;
acquiring a first class of words corresponding to word semantics and class labels of the target attribute data and a second class of words corresponding to word semantics and recommended labels of the target attribute data according to semantic analysis results; the first class of words comprises first sub-class words with word semantics corresponding to names of the Chinese medicines in the category labels and/or second sub-class words with word semantics corresponding to indications in the category labels;
calculating formula according to correlation
Figure BDA0002522544380000031
Respectively calculating the correlation degree between each first class word and each second class word, wherein R isijIs the ith word of the first classDegree of correlation with the jth second class word, Ni_and_jIs the number of times that the ith first-class word and the jth second-class word appear in a target attribute data at the same time, and N isi_or_jIs the sum of the times of occurrence of the ith first-type term and the jth second-type term in all the target attribute data;
and/or the like and/or,
the step of generating digest information for each attribute data of the one or more objects according to the digest algorithm model specifically includes: and generating corresponding abstract information of each attribute data by using an abstract algorithm model based on a TextRank algorithm.
In one technical solution of the method for analyzing a drug application based on medical big data, when a user request is a drug value analysis, "performing data merging processing after calculating a prediction result, an association relation, and digest information according to a determined category label, a recommendation probability, and a business rule, and extracting data to a corresponding drug application state analysis end to output a corresponding analysis result in response to the user request" specifically includes:
acquiring attribute data corresponding to the target medicine name and taking the attribute data as attribute data to be analyzed according to the target medicine name in the user request and the medicine name in the category label corresponding to each attribute data;
acquiring attribute data to be analyzed with the same object label according to the object label corresponding to each attribute data to be analyzed to form an analysis data set corresponding to each object label;
acquiring the data type of each attribute data to be analyzed in an analysis data set of a current object label, acquiring the quantity of the attribute data to be analyzed with the same data type, acquiring a first weight corresponding to each data type based on the corresponding relation between a preset data type and the first weight, weighting and calculating according to the quantity corresponding to each data type and the first weight, and outputting the medicine value corresponding to the current object label according to the calculation result;
and/or the like and/or,
when the user request is a drug recommendation analysis, the step of performing data merging processing after calculating through a business rule according to the determined category label, the prediction result of the recommendation probability, the association relation and the abstract information, and extracting the data to a corresponding drug application state analysis end so as to respond to the user request to output a corresponding analysis result specifically comprises the following steps:
acquiring attribute data corresponding to the target medicine name and taking the attribute data as attribute data to be analyzed according to the target medicine name in the user request and the medicine name in the category label corresponding to each attribute data;
performing statement division on each attribute data to be analyzed, acquiring a first subclass word corresponding to the word semantics and the target medicine name in each attribute data to be analyzed, acquiring a first statement quantity containing the first subclass word in each attribute data to be analyzed according to a statement division result, and outputting a volume value of the target medicine name according to the first statement quantity;
acquiring an indication corresponding to the target medicine name according to the category label of each attribute data to be analyzed, acquiring a second subclass word corresponding to the indication and word semantics in each attribute data to be analyzed, acquiring a second sentence number containing the second subclass word in each attribute data to be analyzed according to a sentence dividing result, and outputting the sound volume value of the indication corresponding to the target medicine name according to the second sentence number;
according to the recommended label of each attribute data to be analyzed, obtaining a second word corresponding to the word semantics and the corresponding recommended label in each attribute data to be analyzed, obtaining a third sentence number containing the second word in each attribute data to be analyzed according to a sentence dividing result, and outputting the sound quantity value of the recommended label corresponding to the name of the target medicine according to the third sentence number;
and/or the presence of a gas in the gas,
setting a radius value corresponding to each first class term according to a reciprocal value of the degree of correlation corresponding to each first class term in each attribute data to be analyzed;
acquiring first-class words in all attribute data to be analyzed, and classifying and summarizing the first-class words according to whether the first-class words are the same words or not to obtain a word set corresponding to each first-class word;
acquiring radius values corresponding to all first-class words in each word set, calculating an average value, and setting the radius value of each first-class word corresponding to each word set according to the calculation result;
displaying the image by taking the second type of words as the circle center and the radius value corresponding to each first type of words; the pattern identification size corresponding to each first-class word depends on the sum of the times of occurrence of each first-class word and each second-class word in one target attribute data;
and/or the presence of a gas in the gas,
acquiring attribute data to be analyzed with the same type of recommendation label according to the recommendation label corresponding to each attribute data to be analyzed, and acquiring the quantity of the attribute data to be analyzed corresponding to each recommendation label according to the acquisition result;
and acquiring a second weight corresponding to each type of recommendation label based on a corresponding relation between a preset recommendation label type and the second weight, performing weighted sum according to the number corresponding to each type of recommendation label and the second weight, and outputting a recommendation value corresponding to the target drug name according to a calculation result.
In a second aspect, a system for drug application analysis based on medical big data is provided, the system comprising:
the first data processing device is configured to collect internal data and external data and respectively process the collected data to form different types of tagged data;
a second data processing device configured to classify respective attribute data of one or more objects according to a classification model algorithm to determine a category label of the respective attribute data based on one or more different types of the tagged data and the respective attribute data of the respectively corresponding one or more objects, predict a recommendation probability of the respective attribute data of the one or more objects according to a neural network prediction model algorithm and output a prediction result, identify an inter-word association relationship for the respective attribute data of the one or more objects according to a vocabulary association analysis model algorithm, and generate digest information for the respective attribute data of the one or more objects according to a digest algorithm model;
and the data analysis device is configured to perform data merging processing after calculation according to the determined category label, the prediction result of the recommendation probability, the association relation and the abstract information through a business rule, and extract data to a corresponding medicine application state analysis end so as to respond to a user request and output a corresponding analysis result.
In an embodiment of the system for analyzing drug application based on medical big data, the system further includes: the first data processing apparatus is configured to perform the following operations:
acquiring an object corresponding to each label type in the acquired data according to a preset label type, and setting a corresponding object label for each object;
acquiring attribute data associated with each object in the acquired data;
and respectively setting the label of the attribute data associated with each object according to the object label corresponding to each object, and acquiring the labeling data corresponding to each label type according to the object corresponding to each label type, the object label of each object and the associated attribute data.
In one technical solution of the above system for analyzing drug application based on medical big data, the second data processing device includes a category label acquisition module, and/or a recommended label acquisition module, and/or an inter-word association relationship identification module, and/or a digest information generation module;
the category label acquisition module is configured to perform the following operations:
acquiring data samples in a preset training set, wherein the data samples comprise attribute samples and corresponding class labels; wherein the category label comprises a drug name and a corresponding one or more indications;
performing model training on a pre-constructed naive Bayes classifier based on the data sample by utilizing a machine learning algorithm;
classifying each attribute data by using a naive Bayes classifier after model training to obtain a class label corresponding to each attribute data;
the recommended label acquisition module is configured to perform the following operations:
analyzing a recommendation category and a recommendation category weight corresponding to each word in a preset corpus according to preset seed words and corresponding recommendation categories and recommendation category weights;
constructing a word vector model according to the word vector, the recommendation category and the recommendation category weight corresponding to each word;
obtaining a word vector and a recommended category weight corresponding to each word in the attribute data by using the word vector model, and obtaining a feature vector corresponding to the attribute data according to the word vector and the recommended category weight corresponding to each word;
predicting the probability of each recommended category corresponding to the attribute data by using an LSTM model according to the feature vector corresponding to the attribute data;
setting a recommendation label of the attribute data according to a recommendation category corresponding to the maximum probability;
the interword association recognition module is configured to perform the following operations:
screening the attribute data according to preset screening conditions to obtain target attribute data;
performing word segmentation processing on the target attribute data, and performing semantic analysis on each word in the target attribute data according to a word segmentation processing result;
acquiring a first class of words corresponding to word semantics and class labels of the target attribute data and a second class of words corresponding to word semantics and recommended labels of the target attribute data according to semantic analysis results; the first class of words comprises first sub-class words with word semantics corresponding to names of the Chinese medicines in the category labels and/or second sub-class words with word semantics corresponding to indications in the category labels;
calculating formula according to correlation
Figure BDA0002522544380000071
Respectively calculating the correlation degree between each first class word and each second class word, wherein R isijIs the degree of correlation between the ith first class word and the jth second class word, Ni_and_jIs the number of times that the ith first-class word and the jth second-class word appear in a target attribute data at the same time, and N isi_or_jIs the sum of the times of occurrence of the ith first-type term and the jth second-type term in all the target attribute data;
the abstract information generation module is configured to generate corresponding abstract information of each attribute data by using an abstract algorithm model based on a TextRank algorithm.
In one embodiment of the system for analyzing drug application based on medical big data, the data analysis device includes a first data analysis device and/or a second data analysis device;
the first data analysis means is configured to perform the following operations when the user request is a drug value analysis:
acquiring attribute data corresponding to the target medicine name and taking the attribute data as attribute data to be analyzed according to the target medicine name in the user request and the medicine name in the category label corresponding to each attribute data;
acquiring attribute data to be analyzed with the same object label according to the object label corresponding to each attribute data to be analyzed to form an analysis data set corresponding to each object label;
acquiring the data type of each attribute data to be analyzed in an analysis data set of a current object label, acquiring the quantity of the attribute data to be analyzed with the same data type, acquiring a first weight corresponding to each data type based on the corresponding relation between a preset data type and the first weight, weighting and calculating according to the quantity corresponding to each data type and the first weight, and outputting the medicine value corresponding to the current object label according to the calculation result;
the second data analysis means is configured to perform the following operations when the user request is a drug recommendation analysis:
acquiring attribute data corresponding to the target medicine name and taking the attribute data as attribute data to be analyzed according to the target medicine name in the user request and the medicine name in the category label corresponding to each attribute data;
performing statement division on each attribute data to be analyzed, acquiring a first subclass word corresponding to the word semantics and the target medicine name in each attribute data to be analyzed, acquiring a first statement quantity containing the first subclass word in each attribute data to be analyzed according to a statement division result, and outputting a volume value of the target medicine name according to the first statement quantity;
acquiring an indication corresponding to the target medicine name according to the category label of each attribute data to be analyzed, acquiring a second subclass word corresponding to the indication and word semantics in each attribute data to be analyzed, acquiring a second sentence number containing the second subclass word in each attribute data to be analyzed according to a sentence dividing result, and outputting the sound volume value of the indication corresponding to the target medicine name according to the second sentence number;
according to the recommended label of each attribute data to be analyzed, obtaining a second word corresponding to the word semantics and the corresponding recommended label in each attribute data to be analyzed, obtaining a third sentence number containing the second word in each attribute data to be analyzed according to a sentence dividing result, and outputting the sound quantity value of the recommended label corresponding to the name of the target medicine according to the third sentence number;
and/or the presence of a gas in the gas,
setting a radius value corresponding to each first class term according to a reciprocal value of the degree of correlation corresponding to each first class term in each attribute data to be analyzed;
acquiring first-class words in all attribute data to be analyzed, and classifying and summarizing the first-class words according to whether the first-class words are the same words or not to obtain a word set corresponding to each first-class word;
acquiring radius values corresponding to all first-class words in each word set, calculating an average value, and setting the radius value of each first-class word corresponding to each word set according to the calculation result;
displaying the image by taking the second type of words as the circle center and the radius value corresponding to each first type of words; the pattern identification size corresponding to each first-class word depends on the sum of the times of occurrence of each first-class word and each second-class word in one target attribute data;
and/or the presence of a gas in the gas,
acquiring attribute data to be analyzed with the same type of recommendation label according to the recommendation label corresponding to each attribute data to be analyzed, and acquiring the quantity of the attribute data to be analyzed corresponding to each recommendation label according to the acquisition result;
and acquiring a second weight corresponding to each type of recommendation label based on a corresponding relation between a preset recommendation label type and the second weight, performing weighted sum according to the number corresponding to each type of recommendation label and the second weight, and outputting a recommendation value corresponding to the target drug name according to a calculation result.
In a third aspect, a storage device is provided, in which a plurality of program codes are stored, the program codes being adapted to be loaded and executed by a processor to perform the method for analyzing a medicine application based on big medicine data according to any of the above technical solutions.
In a fourth aspect, a control device is provided, which comprises a processor and a storage device, wherein the storage device is adapted to store a plurality of program codes, and the program codes are adapted to be loaded and executed by the processor to execute the method for analyzing a medicine application based on medicine big data according to any one of the above technical solutions.
One or more technical schemes of the invention at least have one or more of the following beneficial effects:
in the technical scheme of the invention, the application analysis of the articles can be more comprehensively and accurately carried out based on massive article data information such as medicine data information. Specifically, the collected internal data and external data related to the drug may be processed to form tagged data, which includes objects corresponding to each tag type, and object tags and associated attribute data of each object. Then, carrying out classification analysis on the labeled data according to a classification model algorithm to determine a class label of each attribute data, wherein the class label comprises a medicine name and one or more corresponding indications; predicting recommendation probability of the tagged data according to a neural network prediction model algorithm to determine a recommendation label of each attribute data, wherein the recommendation label comprises recommendation category information such as recommendation, non-recommendation and the like; and performing inter-word association relation analysis on the tagged data according to a word association analysis model algorithm to determine the degree of association between words corresponding to the names of the medicines in the attribute data and words corresponding to the indications, so that the application states of the medicines under different indications can be analyzed according to the degree of association. And finally, extracting the analysis/processing result of the labeled data to a corresponding medicine application state analysis end for analysis according to a user request (including medicine value analysis and/or medicine recommendation analysis). The value analysis results of different objects on the same medicine can be obtained through medicine value analysis, and the application recommendation results of different objects on the same medicine can be obtained through medicine recommendation analysis. Through the steps, the method can comprehensively analyze the medicine application state from multiple dimensions such as medicine value, medicine recommendation and the like, and overcomes the problem of low accuracy of medicine application analysis caused by the fact that the medicine application analysis is only performed according to data such as medicine supply quantity, supply areas and the like in the prior art.
Drawings
Embodiments of the invention are described below with reference to the accompanying drawings, in which:
FIG. 1 is a flow chart illustrating the main steps of a method for pharmaceutical big data based drug application analysis according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of a displayed image of the value of a pharmaceutical product according to one embodiment of the invention;
FIG. 3 is a schematic diagram of a relevancy image display of a first category of words and a second category of words in accordance with one embodiment of the present invention;
FIG. 4 is a schematic diagram of the main structure of a system for medical big data based drug application analysis according to an embodiment of the present invention;
list of reference numerals:
11: a first data processing device; 12: a second data processing device; 13: and a data analysis device.
Detailed Description
Some embodiments of the invention are described below with reference to the accompanying drawings. It should be understood by those skilled in the art that these embodiments are only for explaining the technical principle of the present invention, and are not intended to limit the scope of the present invention.
In the description of the present invention, a "module" or "processor" may include hardware, software, or a combination of both. A module may comprise hardware circuitry, various suitable sensors, communication ports, memory, may comprise software components such as program code, or may be a combination of software and hardware. The processor may be a central processing unit, microprocessor, image processor, digital signal processor, or any other suitable processor. The processor has data and/or signal processing functionality. The processor may be implemented in software, hardware, or a combination thereof. Non-transitory computer readable storage media include any suitable medium that can store program code, such as magnetic disks, hard disks, optical disks, flash memory, read-only memory, random-access memory, and the like. The term "a and/or B" denotes all possible combinations of a and B, such as a alone, B alone or a and B. The term "at least one A or B" or "at least one of A and B" means similar to "A and/or B" and may include only A, only B, or both A and B. The singular forms "a", "an" and "the" may include the plural forms as well.
In the conventional medicine application analysis in the prior art, the application state of the medicine is mainly analyzed according to the supply quantity variation trend of the medicine, the supply area and other data, but because the medicine application state is determined by multiple factors such as the supply quantity, the supply area and user feedback, the accuracy of the medicine application analysis can be obviously reduced if the medicine application analysis is carried out only according to the data such as the supply quantity of the medicine, the supply area and the like.
In the embodiment of the invention, the application analysis of the articles can be more comprehensively and accurately carried out based on massive article data information such as medicine data information. Specifically, according to the method for analyzing the medicine application based on the big medicine data of the embodiment of the present invention, the collected internal data and external data related to the medicine may be processed to form tagged data, where the tagged data includes objects corresponding to each tag type, and object tags and associated attribute data of each object. Then, carrying out classification analysis on the labeled data according to a classification model algorithm to determine a class label of each attribute data, wherein the class label comprises a medicine name and one or more corresponding indications; predicting recommendation probability of the tagged data according to a neural network prediction model algorithm to determine a recommendation label of each attribute data, wherein the recommendation label comprises recommendation category information such as recommendation, non-recommendation and the like; and performing inter-word association relation analysis on the tagged data according to a word association analysis model algorithm to determine the degree of association between words corresponding to the names of the medicines in the attribute data and words corresponding to the indications, so that the application states of the medicines under different indications can be analyzed according to the degree of association. And finally, extracting the analysis/processing result of the labeled data to a corresponding medicine application state analysis end for analysis according to a user request (for example, medicine value analysis and medicine recommendation analysis). The value analysis results of different objects on the same medicine can be obtained through medicine value analysis, and the application recommendation results of different objects on the same medicine can be obtained through medicine recommendation analysis. Through the steps, the method can comprehensively analyze the medicine application state from multiple dimensions such as medicine value, medicine recommendation and the like, and overcomes the problem of low accuracy of medicine application analysis caused by the fact that the medicine application analysis is only performed according to data such as medicine supply quantity, supply areas and the like in the prior art.
Referring to fig. 1, fig. 1 is a flow chart illustrating the main steps of a method for analyzing a pharmaceutical application based on medical big data according to an embodiment of the present invention. As shown in fig. 1, the method for analyzing a drug application based on medical big data in an embodiment of the present invention may include the following steps:
step S101: and acquiring internal data and external data and respectively processing the acquired data to form different types of tagged data.
In this embodiment, the internal data and the external data are both article-related data, and the internal data is article-related data that is acquired and stored in advance. An example is as follows: the internal data may be data stored in advance in a database or a computer-readable storage medium. The external data refers to article-related data which is not acquired in advance and is stored well, and the external data needs to be acquired from a data platform storing the article-related data through a data acquisition method such as data crawling. In one embodiment, the article may be a medicine, the internal data may be medicine-related data that is acquired in advance and stored, and the external data may be medicine-related data that is not acquired in advance and stored, and needs to be acquired from a data platform in which the medicine-related data is stored. An example is as follows: the item is a pharmaceutical product and the item-related data includes, but is not limited to: medical data such as doctor information, hospital information, document information published by doctors, and doctor diagnosis information.
In this embodiment, the tagged data refers to data including tag information, which is formed by setting tags for internal data and external data. In one embodiment, the collected internal data and external data may be processed to form different types of tagged data according to the following steps:
step S1011: and acquiring an object corresponding to each label type in the acquired data according to a preset label type, and setting a corresponding object label for each object.
The collected data in this embodiment refers to the collected internal data and the collected external data, and these data usually include a plurality of data objects, and the types of the data objects may be the same or different. By setting the label to the data, it can clearly show what types of data objects the data contains, and the preset label type refers to the type to which the preset data object belongs. In one embodiment, the internal data and the external data are medical data, and the predetermined label types include, but are not limited to: doctors, hospitals and medicines, etc. The doctor class refers to a label type to which a doctor name included in the data belongs, the hospital class refers to a label type to which a hospital name included in the data belongs, the medicine class refers to a label type to which a medicine name included in the data belongs, and the doctor name, the hospital name, and the medicine name are all the data objects. An example is as follows: the preset label types comprise doctor types and medicine types, the internal data comprise a document L1 published by a doctor A and a document L2 published by a doctor B and related to medicine a, objects corresponding to the doctor type labels comprise the doctor A and the doctor B can be obtained according to the preset label types, the objects corresponding to the medicine type labels comprise the medicine a and the medicine B, the object label can be set as the doctor A for the doctor A, the object label is set as the doctor B for the doctor B, the object label is set as the medicine a for the medicine a, and the object label is set as the medicine B for the medicine B.
Step S1012: attribute data associated with each object in the collected data is obtained.
The attribute data associated with the object in the present embodiment refers to data that is associated with the object in the collected data. An example is as follows: if the collected data includes resume information of doctor a, hospital information at which doctor a works, and a document L1 issued by doctor a about drug a, the data associated with doctor a includes the resume information, hospital information, and document L1, which are attribute data associated with doctor a.
Step S1013: and respectively setting the label of the attribute data associated with each object according to the object label corresponding to each object, and acquiring the labeled data corresponding to each label type according to the object corresponding to each label type (such as a doctor type, a medicine type and the like), the object label of each object and the associated attribute data.
An example is as follows: if the object label of doctor a is "doctor a", labels that may be corresponding attribute data such as resume information, hospital information, and document L1 are also set to "doctor a".
Step S102: acquiring a category label and a recommendation label of the attribute data, identifying an association relation between words of the attribute data, and generating abstract information of the attribute data.
As can be seen from step S101, the data in the internal data and the external data are both related data of the article, and the category label of the attribute data is the category information of the article contained in the related data of the article corresponding to the attribute data. In one embodiment, if both the internal data and the external data are drug related data, the category label of the attribute data may include a drug name and corresponding one or more indications. An example is as follows: the medicine has the name of dredging heart meridian, and the corresponding indications include coronary heart disease, angina pectoris, myocardial ischemia, myocardial infarction, etc.
The recommendation label of the attribute data refers to information which can represent the item recommendation type corresponding to the current attribute data and is obtained after data analysis is performed according to data related to item recommendation content contained in the attribute data. Item recommendation types include, but are not limited to: recommended, not recommended, and no obvious recommendation/not recommendation tendency, etc. An example is as follows: if the attribute data is a study document for a drug and the recommendation label for the attribute data is "not recommended," then it can be concluded from the attribute data that the drug is not recommended.
The association relationship between words of the attribute data refers to the correlation between the word semantics and the first class words corresponding to the category labels and the correlation between the word semantics and the second class words corresponding to the recommended labels in the attribute data. By identifying the correlation between the two types of words (the first type of words and the second type of words), the recommendation label corresponding to each type of label can be obtained, and because the type label represents the type information of the article and the recommendation label represents the recommendation type of the article, the recommendation type, such as recommendation and non-recommendation, corresponding to each type of article can be obtained according to the recommendation label corresponding to each type of label.
The following respectively describes the category label, recommendation label, inter-word association relationship of the attribute data, and the method for acquiring the abstract information.
1. Category labels for attribute data
In this embodiment, the category label of the attribute data may be obtained by the following steps:
step 11: and acquiring data samples in a preset training set.
In this embodiment, the data samples include attribute samples and corresponding class labels. The specific meanings of the data sample, the attribute sample and the category label are the same as those of the "internal data and external data", the attribute data and the category label in step S101, and are not described herein again for brevity of description.
In one embodiment, if the data samples in the preset training set are medical data, information such as names of medicines and indications can be obtained from related information such as a medicine specification, and then the names of the medicines and the indications are used to perform medicine name and/or indication marking on the medical data collected in advance, that is, a corresponding category label is set for each piece of medical data.
Step 12: and (4) performing model training on a pre-constructed naive Bayes classifier by using a machine learning algorithm based on the data sample acquired in the step 11. It should be noted that the naive bayes classifier (navibayes classifier) and the corresponding training method adopted in this embodiment are respectively a conventional network structure and a training method in the field of neural network technology and machine learning technology, and are not described herein again for brevity of description.
After the trained naive bayes classifier is obtained, each attribute data obtained in the step S101 can be classified by using the naive bayes classifier, so as to obtain a class label corresponding to each attribute data.
2. Recommendation tag for attribute data
In this embodiment, the recommended label of the attribute data may be obtained according to the following steps:
step 21: analyzing a recommendation category and a recommendation category weight corresponding to each word in a preset corpus according to preset seed words and corresponding recommendation categories and recommendation category weights; and constructing a word vector model according to the word vector, the recommendation category and the recommendation category weight corresponding to each word.
The preset seed words may be words in a preset corpus, or words stored in other databases/storage devices other than the preset corpus.
In this embodiment, the recommended category and the recommended category weight of a word in the preset corpus may be obtained according to the following steps: and respectively calculating the similarity between the certain word and each seed word, and then selecting the recommendation category of the seed word corresponding to the maximum similarity as the recommendation category of the certain word. An example is as follows: the seed term includes A, B and C, and the corresponding recommendation categories are a, b, and C, respectively. The recommended category of the word w in the preset corpus can be obtained according to the following steps: similarity s1, s2 and s3 of the word w and the seed words A, B and C are obtained through calculation respectively, and s1 is greater than s2 is greater than s3, and since the maximum similarity is the similarity s1 corresponding to the seed word A, the recommendation category a corresponding to the seed word A is used as the recommendation category of the word w. Further, after the recommended category of a certain word is obtained, the recommended category weight corresponding to the word may be calculated according to the method described in the formula weight _ i ═ weight _ sj × sij. Wherein, weight _ i represents the recommendation category weight corresponding to the ith word, weight _ sj represents the recommendation category weight corresponding to the jth seed word, sij represents the similarity between the ith word and the jth seed word, and the jth seed word refers to the seed word corresponding to the recommendation category of the certain word, namely, the seed word corresponding to the maximum similarity. An example is as follows: if the recommended category of the word w is the recommended category a corresponding to the seed word a, it can be found that the recommended category weight of the word w is weight _ w-weight _ sA × swA, weight _ sA is the recommended category weight of the seed word a, and swA is the similarity between the word w and the seed word a.
An example is as follows: the preset seed words and the corresponding recommended categories and recommended category weights are shown in table 1. Wherein the preset seed words comprise suggested use, recommendation, use, medicine stopping, no eating and medicine stopping.
TABLE 1
Recommendation categories Seed words Recommending category weights
Recommending Suggest use, recommend, use 0.8
Is not recommended Stop taking medicine, stop using medicine, stop taking medicine -0.8
As shown in table 1, assuming that the word "do not eat" in the preset corpus has the greatest similarity to the seed word "stop medicine" and the similarity is 1, it may be determined that the recommended category of the word "do not eat" is "not recommended", and the recommended category weight is-0.8 × 1 — 0.8.
In this embodiment, the word vector model may be constructed according to the following steps: performing word vector product operation according to the word vector corresponding to each word and the recommended category weight; and carrying out vector splicing on the word vector product operation result corresponding to each word to obtain a word vector model. In one embodiment, Word vectors of words in a preset corpus can be obtained by using a Word training model CBOW in Word2Vec (Word to vector) in the technical field of machine learning, and Word vector product operation results corresponding to each Word are vector-spliced by using a Stack method in the technical field of array splicing to obtain a Word vector model.
An example is as follows: assume that the preset corpus includes a word a, a word B, and a word C, word vectors corresponding to the word a, the word B, and the word C are a word vector a, a word vector B, and a word vector C, respectively, and recommendation category weights corresponding to the word a, the word B, and the word C are a weight w1, a weight w2, and a weight w3, respectively. Then, performing a word vector multiplication operation on the word vectors corresponding to the word a, the word b, and the word c and the recommended category weights to obtain the word vector multiplication operation results w1A, w2B, and w3C corresponding to each word. And finally, carrying out vector splicing on the word vector product operation result to obtain a word vector model Y ═ w1A, w2B and w 3C.
Step 22: and acquiring a word vector and a recommended category weight corresponding to each word in the attribute data by using the constructed word vector model, and acquiring a feature vector corresponding to the attribute data according to the word vector and the recommended category weight corresponding to each word. In one embodiment, a Convolutional Neural Network (CNN) in the technical field of machine learning may be used to extract data features corresponding to each word in attribute data, obtain word vectors corresponding to each word according to the data features and using a word vector model, and further perform vector concatenation on all the word vectors to obtain feature vectors of the attribute data. Specifically, firstly, the data features corresponding to each word are extracted through the convolution layer of the convolutional neural network CNN, then the extracted data features are multiplied by a word vector model through a vector matrix to obtain a word vector corresponding to each word, and finally all the word vectors are subjected to vector splicing to obtain the feature vectors of the attribute data.
Step 23: and predicting the probability of each recommended category corresponding to the attribute data by using the LSTM model and according to the feature vector corresponding to the attribute data. The LSTM model refers to a network model constructed based on a Long Short-term memory network (Long Short-term memory). It should be noted that the LSTM model and the corresponding training method adopted in this embodiment are respectively a conventional network structure and a training method in the field of neural network technology and machine learning technology, and are not described herein again for brevity of description.
Step 24: and setting a recommendation label of the attribute data according to the recommendation category corresponding to the maximum probability. An example is as follows: if the recommendation category corresponding to the maximum probability is not recommended, the recommendation label of the attribute data may be set to "not recommended".
3. Interword association relationship of attribute data
In this embodiment, the association relationship between words of the attribute data may be obtained according to the following steps:
step 31: and screening the attribute data according to preset screening conditions to obtain target attribute data. The preset screening conditions in this embodiment include, but are not limited to: the object to which the attribute data belongs, the area in which the object is located, the category label of the attribute data, the time when the attribute data is published, and the like. In one embodiment, if the internal data and the external data are both drug-related data, the attribute data is a document published by a doctor in the drug-related data, and the category label of the attribute data includes a drug name and one or more corresponding indications, the preset screening conditions include, but are not limited to: doctor name, province/city where doctor is located, medicine name, indication name, and publication time.
Step 32: and performing word segmentation processing on the target attribute data, and performing semantic analysis on each word in the target attribute data according to a word segmentation processing result. In one embodiment, if the target attribute data is Chinese text data, the target attribute data may be participled using a Chinese segmentation System ICTCLAS (Institute of computing technology) in the field of natural language processing technology, followed by semantic Analysis of each term in the target attribute data using a Hidden Markov Model (HMM). It is noted that although the present invention provides only one embodiment of using ICTCLAS for word segmentation and HMM for semantic analysis, those skilled in the art will appreciate that the scope of the present invention is not limited to this embodiment. Without departing from the principle of the present invention, a person skilled in the art can perform word segmentation processing on data by using other word segmentation processing methods and perform semantic analysis on data by using other semantic analysis methods, and such modifications and alternatives fall within the scope of the present invention.
Step 33: and acquiring a first class of words corresponding to the word semantics and the class labels of the target attribute data and a second class of words corresponding to the recommended labels of the word semantics and the target attribute data according to the semantic analysis result. The first class of words comprises first sub-class words with word semantics corresponding to names of the medicines in the category labels and/or second sub-class words with word semantics corresponding to indications in the category labels.
An example is as follows: the medicine name in the category label of the target attribute data is "clear collaterals", the recommendation label is "not recommended", and if the target attribute data is the following section of doctor-patient dialogue information: "patient questions: hua Fa Na we can substitute the Tongxinluo capsule in advance? The doctor answers: dredging the heart meridian is not a curative drug, can not replace, Hua Fa Na is an anticoagulant, prevent the thrombus, prevent hemiplegia suddenly, then can obtain the first word that the word semantic and medicine name are "dredging the heart meridian" and correspond to through semantic analysis and include: "Tongxinluo capsule" and "Tongxinluo"; the second category of words whose word semantics correspond to "not recommended" includes: "replace can not".
Step 34: calculating the correlation degree between each first class word and each second class word according to a correlation degree calculation formula shown in the following formula (1):
Figure BDA0002522544380000181
the meaning of each parameter in the formula (1) is:
Rijis the degree of correlation between the ith first class word and the jth second class word, Ni_and_jIs the number of times that the ith first-class term and the jth second-class term appear in a target attribute data at the same time, Ni_or_jIs the sum of the number of times the ith first-type term and the jth second-type term appear in all target attribute data.
4. Digest information of attribute data
In this embodiment, a digest algorithm model based on the TextRank algorithm may be used to generate digest information corresponding to each attribute data. It is noted that although the present invention provides only a specific embodiment of generating digest information of attribute data using a digest algorithm model based on the TextRank algorithm, it can be understood by those skilled in the art that the scope of the present invention is not limited to this specific embodiment. Without departing from the principle of the present invention, a person skilled in the art may use other abstract information obtaining methods to obtain the abstract information of the attribute data, and such modifications and alternatives fall within the scope of the present invention.
Step S103: and performing data merging processing after calculating through a business rule according to the category label, the recommendation label, the word association relation and the abstract information, and extracting the data to a corresponding medicine application state analysis end so as to respond to a user request and output a corresponding analysis result.
In this embodiment the user request includes a drug value analysis and a drug recommendation analysis. The following is a detailed description of the data processing requested by these two users.
1. Analysis of drug value
In this embodiment, data processing may be performed according to the following steps to output a corresponding analysis result in response to a drug value analysis request:
step 41: and acquiring attribute data corresponding to the target medicine name and taking the attribute data as attribute data to be analyzed according to the target medicine name in the user request and the medicine name in the category label corresponding to each attribute data. That is, the attribute data is screened according to the name of the target drug, and the attribute data of which the drug name is the name of the target drug in the category label is screened out to be used as the attribute data to be analyzed.
Step 42: and acquiring the attribute data to be analyzed with the same object label according to the object label corresponding to each attribute data to be analyzed to form an analysis data set corresponding to each object label. That is, the attribute data is classified and summarized according to the object tags, and different analysis data sets corresponding to different object tags are formed.
Step 43: the method comprises the steps of obtaining the data type of each attribute data to be analyzed in an analysis data set of a current object label, obtaining the number of the attribute data to be analyzed with the same data type, obtaining a first weight corresponding to each data type based on the corresponding relation between a preset data type and the first weight, conducting weighting and calculation according to the number corresponding to each data type and the first weight, and outputting the medicine value corresponding to the current object label according to the calculation result.
In the present embodiment, the value of the medicine can be calculated according to the method shown in the following formula (2):
V=w1×N1+,...,+wk×Nk+,...,+wn×Nn(2)
the meaning of each parameter in formula (2):
v is the drug value, wkIs a first weight, N, corresponding to the kth data typekIs the number of attribute data to be analyzed of the kth data type, n is the total number of data types, and k is 1.
In this embodiment, the value of the medicine corresponding to the target label in a certain time period can be obtained through the above steps 41 to 43, and the image display can be performed. An example is as follows: the names of the target medicines in the user request comprise heart-meridian dredging, compound salvia miltiorrhiza, brain-heart dredging, musk heart protecting and ginseng pine heart nourishing, the medicine values of the medicines in a period of 2016, 9 and 2020, 2 are obtained through the steps 41 to 43, and the change trend of the value of each medicine can be known more intuitively by performing image display according to the medicine values in the period of time. Referring to fig. 2, fig. 2 exemplarily shows a display image of a medicine value in the present embodiment, in which an abscissa indicates time and an ordinate indicates a specific numerical value of the medicine value. As can be seen from fig. 2, the drug values of heart meridian dredging, compound red sage root, brain heart dredging, musk heart protecting and ginseng pine heart nourishing in the period of 9 months to 2020 months are 16, 3, 10 and 6 respectively.
In one embodiment, if the attribute data is a doctor's published literature, conference speech, medical research results (e.g., clinical guidelines), etc., then the data types may include, but are not limited to: documents published as a first author, documents published as a non-first author, conference utterances, medical research results, documents published as a first author and cited, documents published as a non-first author and cited.
Further, in the embodiment, the medicine values under different dimension analyses can be calculated according to the attribute data to be analyzed of different data types.
Specifically, the value of the medicine in the "study dimension of doctors on medicine academia" can be calculated according to the attribute data to be analyzed of the data types such as the document published as the first author, the document published as the non-first author, the conference statement, and the medical research result, and the research degree of different doctors on the medicine can be obtained according to the value of the medicine, and the larger the value of the medicine corresponding to a doctor is, the larger the research degree of the doctor on the medicine is. The corresponding medicine values of different doctors can reflect the research directions and the key points of medicines of different doctors.
The medicine value under the 'influence of doctors on medicines' can be calculated according to the attribute data to be analyzed of data types of 'documents published and cited as first authors, documents published and cited as non-first authors, conference statements, medical research results' and the like, the influence degree of different doctors on medicines can be obtained according to the medicine value, and the larger the medicine value corresponding to a doctor is, the larger the influence degree of the doctor on the medicine is.
In this embodiment, different "corresponding relationships between data types and first weights" may be set for the different analysis dimensions, respectively, when calculating the value of the drug under analysis of different dimensions, the corresponding "corresponding relationships between data types and first weights" may be obtained first, and then the value of the drug may be calculated by using the corresponding relationships and the method shown in formula (2).
In one embodiment, the medicine values corresponding to the object labels can be sorted in the numerical order from large to small, then the medicine values with the sorting value smaller than the preset sorting value are selected, and the selected medicine values and the corresponding object labels are displayed in an image, so that a user can more visually know the medicine values and the ranking conditions of the different object labels.
2. Drug recommendation analysis
In this embodiment, data processing may be performed according to the following steps to output a corresponding analysis result in response to the drug recommendation analysis request:
step 51: and acquiring attribute data corresponding to the target medicine name and taking the attribute data as attribute data to be analyzed according to the target medicine name in the user request and the medicine name in the category label corresponding to each attribute data. That is, the attribute data is screened according to the name of the target drug, and the attribute data of which the drug name is the name of the target drug in the category label is screened out to be used as the attribute data to be analyzed.
Step 52: the method comprises the steps of carrying out statement division on each attribute data to be analyzed, obtaining a first subclass word corresponding to the word semantics and the target medicine name in each attribute data to be analyzed, obtaining a first sentence quantity containing the first subclass word in each attribute data to be analyzed according to a statement division result, outputting a sound volume value of the target medicine name according to the first sentence quantity, obtaining the attention degree of the target medicine name according to the sound volume value, and indicating that the attention degree of the target medicine name is larger when the sound volume value is larger.
An example is as follows: if only one attribute data to be analyzed, which is the interaction data between the doctor and the patient including 10 dialogues and the number of dialogues including the first sub-category word is 4, is obtained after the filtering in step 51, the volume value of the target drug name is 4.
Step 53: obtaining an indication corresponding to the target medicine name according to the category label of each attribute data to be analyzed, obtaining a second subclass word corresponding to the word semantics and the indication in each attribute data to be analyzed, obtaining a second sentence number containing the second subclass word in each attribute data to be analyzed according to the sentence dividing result, outputting the sound volume value of the indication corresponding to the target medicine name according to the second sentence number, obtaining the attention degree of the indication corresponding to the target medicine name according to the sound volume value, wherein the larger the sound volume value is, the larger the attention degree of the indication corresponding to the target medicine name is.
An example is as follows: if only one attribute data to be analyzed is obtained after the screening in step 51, the attribute data to be analyzed is the interactive data between the doctor and the patient including 10 dialogues, and the number of dialogues including the second sub-category words is 3, the volume value of the indication is 3.
Step 54: according to the recommendation label of each attribute data to be analyzed, a second word corresponding to the corresponding recommendation label of the word identifier in each attribute data to be analyzed is obtained, a third statement quantity containing the second word in each attribute data to be analyzed is obtained according to the statement division result, the sound quantity value of the recommendation label corresponding to the name of the target medicine is output according to the third statement quantity, the recommended degree of the target medicine can be obtained according to the sound quantity value, and if the recommended sound quantity value is far larger than the unrecommended sound quantity value, the recommended degree of the target medicine is far larger than the unrecommended degree.
An example is as follows: if only one attribute data to be analyzed is obtained after the filtering in step 51, the attribute data to be analyzed is the interactive information between the doctor and the patient containing 10 dialogues, and the number of dialogues containing the second category of words is 1, the sound volume value of the recommendation label is 1.
In an embodiment, after obtaining the correlation between each first-class word and each second-class word in each attribute data to be analyzed, an image display may be performed according to the correlation, so as to more intuitively represent the association relationship between the first-class word and the second-class word in an image manner, specifically including:
step 61: and setting the radius value corresponding to each first class term according to the reciprocal value of the degree of correlation corresponding to each first class term in each attribute data to be analyzed. Wherein, the relevancy of the first category of words can be obtained through the steps 31 to 34.
Step 62: and acquiring first-class words in all attribute data to be analyzed, and classifying and summarizing the first-class words according to whether the first-class words are the same words to obtain a word set corresponding to each first-class word, wherein the word set comprises each first-class word and a corresponding radius value. An example is as follows: if 5 attribute data to be analyzed are obtained after the screening in the step 51, wherein 3 attribute data to be analyzed all include the first-class words a, and 2 attribute data to be analyzed all include the second-class words B, the 5 first-class words are classified and summarized according to whether the same word is used, and then a word set corresponding to the word a and a word set corresponding to the word B are obtained. The term set corresponding to the term a includes the first term a and the corresponding radius value in the 3 pieces of attribute data to be analyzed, and the term set corresponding to the term B includes the first term B and the corresponding radius value in the 2 pieces of attribute data to be analyzed.
And step 63: and acquiring the radius values corresponding to all the first-class words in each word set, calculating the average value, and setting the radius value of each first-class word corresponding to each word set according to the calculation result.
Step 64: and displaying the image by taking the second category of words as the center of a circle and taking the radius value (the radius average value calculated in the step 63) corresponding to each first category of words. And the pattern identification size corresponding to each first-class word depends on the sum of the times of the first-class word and the second-class word appearing in the target attribute data at the same time. An example is as follows: the larger the sum of the number of times that the first-type word and the second-type word appear in one target attribute data at the same time, the larger the pattern identification size corresponding to the first-type word. An example is as follows: fig. 3 exemplarily shows display images of the first category words and the second category words in this embodiment. As shown in FIG. 3, the first category of words includes indications of dredging collaterals, such as coronary heart disease, hypertension, chest distress, etc., and the second category of words is strong recommendation. The coronary heart disease is most related to the strong recommendation, which indicates that the heart meridian dredging is recommended to have the greatest strength for treating the coronary heart disease. Further, in an embodiment, the sound volume value of the first category word (the target medicine name and/or the adaptive sound volume value of the target medicine name) in a period of time may also be obtained through the above steps 51 to 53, and then the color of the pattern identifier corresponding to the first category word is set according to the change trend of the sound volume value of the first category word. An example is as follows: if the change trend of the sound volume value is an ascending trend, setting the pattern identification to be red; if the trend of change in the sound volume value is a downward trend, the pattern flag is set to blue.
In one embodiment, analyzing the recommended value of the target drug according to the recommended label corresponding to each attribute data to be analyzed specifically includes:
step 71: and acquiring the attribute data to be analyzed with the same type of recommendation label according to the recommendation label corresponding to each attribute data to be analyzed, and acquiring the quantity of the attribute data to be analyzed corresponding to each type of recommendation label according to the acquisition result. Namely, the attribute data are classified and summarized according to the recommended labels, and the number of the attribute data to be analyzed corresponding to different recommended labels is obtained.
Step 72: and acquiring a second weight corresponding to each type of recommendation label based on a corresponding relation between the type of the preset recommendation label and the second weight, performing weighted sum according to the number corresponding to each type of recommendation label and the second weight, and outputting a recommendation value corresponding to the name of the target drug according to a calculation result. In the present embodiment, the recommended value may be calculated by a method shown in the following formula (3):
R=w1×N1+,...,wl×Nl,...,wm×Nm(3)
the meaning of each parameter in formula (3):
r is a recommended value, wlIs a second weight, N, corresponding to the ith recommended label typelIs the number of attribute data to be analyzed of the ith recommended tag type, m is the total number of recommended tag types, and l is 1.
It should be noted that, although the foregoing embodiments describe each step in a specific sequence, those skilled in the art will understand that, in order to achieve the effect of the present invention, different steps do not necessarily need to be executed in such a sequence, and they may be executed simultaneously (in parallel) or in other sequences, and these changes are all within the protection scope of the present invention.
Referring to fig. 4, fig. 4 is a schematic diagram of a main structure of a system for analyzing a pharmaceutical application based on medical big data according to an embodiment of the present invention. As shown in fig. 4, the system for analyzing medicine application based on big medicine data in the embodiment of the present invention mainly includes a first data processing device 11, a second data processing device 12, and a data analysis device 13. In some embodiments, one or more of the first data processing device 11, the second data processing device 12 and the data analysis device 13 may be combined together into one module. In some embodiments, the first data processing device 11 may be configured to collect internal data as well as external data and perform the processing of the collected data to form different types of tagged data, respectively. The second data processing apparatus 12 may be configured to classify respective attribute data of one or more objects according to a classification model algorithm based on one or more different types of tagged data and respective attribute data of the respective corresponding one or more objects to determine a category label of the respective attribute data, predict a recommendation probability of the respective attribute data of the one or more objects according to a neural network prediction model algorithm and output a prediction result, identify an inter-word association relationship for the respective attribute data of the one or more objects according to a vocabulary association analysis model algorithm, and generate digest information for the respective attribute data of the one or more objects according to a digest algorithm model. The data analysis device 13 may be configured to perform data merging processing after calculating through the business rules according to the determined category label, the prediction result of the recommendation probability, the association relation, and the digest information, and extract data to a corresponding drug application state analysis end to output a corresponding analysis result in response to a user request. In one embodiment, the description of the specific implementation function may be referred to in steps S101 to S103.
In one embodiment, the first data processing apparatus 11 may be configured to perform the following operations:
acquiring an object corresponding to each label type in the acquired data according to a preset label type, and setting a corresponding object label for each object;
acquiring attribute data associated with each object in the acquired data;
and respectively setting the label of the attribute data associated with each object according to the object label corresponding to each object, and acquiring the labeling data corresponding to each label type according to the object corresponding to each label type, the object label of each object and the associated attribute data. In one embodiment, the description of the specific implementation function may refer to the descriptions of step S1011 to step S1013.
In one embodiment, the second data processing device 12 includes a category tag acquisition module, and/or a recommendation tag acquisition module, and/or an interword association relationship identification module, and/or a digest information generation module.
The category label acquisition module may be configured to perform the following operations: acquiring data samples in a preset training set, wherein the data samples comprise attribute samples and corresponding class labels; wherein the category label comprises a drug name and a corresponding one or more indications; performing model training on a pre-constructed naive Bayes classifier based on a data sample by utilizing a machine learning algorithm; and classifying each attribute data by using a naive Bayes classifier after model training to obtain a class label corresponding to each attribute data. In one embodiment, the specific implementation functions may be described in steps 11 to 12.
The recommendation tag acquisition module may be configured to perform the following operations: analyzing a recommendation category and a recommendation category weight corresponding to each word in a preset corpus according to preset seed words and corresponding recommendation categories and recommendation category weights; constructing a word vector model according to the word vector, the recommendation category and the recommendation category weight corresponding to each word; obtaining a word vector and a recommended category weight corresponding to each word in the attribute data by using a word vector model, and obtaining a feature vector corresponding to the attribute data according to the word vector and the recommended category weight corresponding to each word; predicting the probability of each recommended category corresponding to the attribute data by using an LSTM model according to the feature vector corresponding to the attribute data; and setting a recommendation label of the attribute data according to the recommendation category corresponding to the maximum probability. In one embodiment, the specific implementation functions may be described in reference to steps 21-24.
The interword association recognition module may be configured to perform the following operations: screening the attribute data according to preset screening conditions to obtain target attribute data; performing word segmentation processing on the target attribute data, and performing semantic analysis on each word in the target attribute data according to a word segmentation processing result; acquiring a first class of words corresponding to the word semantics and the class labels of the target attribute data and a second class of words corresponding to the recommended labels of the word semantics and the target attribute data according to the semantic analysis result; the first class of words comprises first sub-class words with word semantics corresponding to names of the traditional Chinese medicines in the category labels and/or second sub-class words with word semantics corresponding to indications in the category labels; calculating formula according to correlation
Figure BDA0002522544380000251
Respectively calculating the correlation degree between each first class word and each second class word, RijIs the degree of correlation between the ith first class word and the jth second class word, Ni_and_jIs the number of times that the ith first-class term and the jth second-class term appear in a target attribute data at the same time, Ni_or_jIs the ithThe sum of the number of times that the first-type term and the jth second-type term appear in all target attribute data. In one embodiment, the specific implementation functions may be described in reference to steps 31-34.
The digest information generation module may be configured to generate digest information corresponding to each attribute data using a digest algorithm model based on the TextRank algorithm. It is noted that although the present invention provides only a specific embodiment of generating digest information of attribute data using a digest algorithm model based on the TextRank algorithm, it can be understood by those skilled in the art that the scope of the present invention is not limited to this specific embodiment. Without departing from the principle of the present invention, a person skilled in the art may use other abstract information obtaining methods to obtain the abstract information of the attribute data, and such modifications and alternatives fall within the scope of the present invention.
In one embodiment, the data analysis device 13 comprises a first data analysis device and/or a second data analysis device.
The first data analysis means may be configured to perform the following operations when the user request is a drug value analysis: acquiring attribute data corresponding to the target medicine name and taking the attribute data as attribute data to be analyzed according to the target medicine name in the user request and the medicine name in the category label corresponding to each attribute data; acquiring attribute data to be analyzed with the same object label according to the object label corresponding to each attribute data to be analyzed to form an analysis data set corresponding to each object label; the method comprises the steps of obtaining the data type of each attribute data to be analyzed in an analysis data set of a current object label, obtaining the number of the attribute data to be analyzed with the same data type, obtaining a first weight corresponding to each data type based on the corresponding relation between a preset data type and the first weight, conducting weighting and calculation according to the number corresponding to each data type and the first weight, and outputting the medicine value corresponding to the current object label according to the calculation result. In one embodiment, the detailed implementation function can be described in reference to steps 41 to 43.
The second data analysis means may be configured to perform the following operations when the user request is a drug recommendation analysis: acquiring attribute data corresponding to the target medicine name and taking the attribute data as attribute data to be analyzed according to the target medicine name in the user request and the medicine name in the category label corresponding to each attribute data; performing statement division on each attribute data to be analyzed, acquiring a first subclass word corresponding to the word semantics and the target medicine name in each attribute data to be analyzed, acquiring a first statement quantity containing the first subclass word in each attribute data to be analyzed according to a statement division result, and outputting a sound volume value of the target medicine name according to the first statement quantity; acquiring an indication corresponding to the name of the target medicine according to the category label of each attribute data to be analyzed, acquiring a second subclass word corresponding to the word semantics and the indication in each attribute data to be analyzed, acquiring a second sentence number containing the second subclass word in each attribute data to be analyzed according to the sentence dividing result, and outputting the sound volume value of the indication corresponding to the name of the target medicine according to the second sentence number; and obtaining a second word corresponding to the word semantics and the corresponding recommended label in each attribute data to be analyzed according to the recommended label of each attribute data to be analyzed, obtaining a third sentence quantity containing the second word in each attribute data to be analyzed according to the sentence dividing result, and outputting the sound quantity value of the recommended label corresponding to the name of the target medicine according to the third sentence quantity. In one embodiment, the specific implementation functions may be described in reference to steps 51-54.
Further, in the present embodiment, the second data analysis means may be configured to perform the following operations when the user request is a drug recommendation analysis: setting a radius value corresponding to each first class term according to a reciprocal value of the degree of correlation corresponding to each first class term in each attribute data to be analyzed; acquiring first-class words in all attribute data to be analyzed, and classifying and summarizing the first-class words according to whether the first-class words are the same words or not to obtain a word set corresponding to each first-class word; acquiring radius values corresponding to all first-class words in each word set, calculating an average value, and setting the radius value of each first-class word corresponding to each word set according to the calculation result; displaying the image by taking the second type of words as the circle center and the radius value corresponding to each first type of words; and the pattern identification size corresponding to each first-class word depends on the sum of the times of the first-class word and the second-class word appearing in the target attribute data at the same time. In one embodiment, the detailed implementation functions may be described with reference to steps 61-64.
Further, in the present embodiment, the second data analysis means may be configured to perform the following operations when the user request is a drug recommendation analysis: acquiring attribute data to be analyzed with the same type of recommendation label according to the recommendation label corresponding to each attribute data to be analyzed, and acquiring the quantity of the attribute data to be analyzed corresponding to each recommendation label according to the acquisition result; and acquiring a second weight corresponding to each type of recommendation label based on a corresponding relation between the type of the preset recommendation label and the second weight, performing weighted sum according to the number corresponding to each type of recommendation label and the second weight, and outputting a recommendation value corresponding to the target drug name according to a calculation result. . In one embodiment, the specific implementation functions may be described in reference to steps 71-72.
In the above system for analyzing drug application based on big pharmaceutical data, which is used for executing the embodiment of the method for analyzing drug application based on big pharmaceutical data shown in fig. 1, the technical principles, the solved technical problems, and the generated technical effects of the two are similar, and it can be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working process and related descriptions of the system for analyzing drug application based on big pharmaceutical data may refer to the contents described in the embodiment of the method for analyzing drug application based on big pharmaceutical data, and the details are not repeated here.
Furthermore, the invention also provides a storage device. In this embodiment, the storage device may be configured to store a program for executing the method for medical big data based drug application analysis of the above method embodiment, and the program may be loaded and executed by the processor to implement the method for medical big data based drug application analysis. For convenience of explanation, only the parts related to the embodiments of the present invention are shown, and details of the specific techniques are not disclosed. The storage device may be a storage device apparatus formed by including various electronic devices, and optionally, a non-transitory computer-readable storage medium is stored in the embodiment of the present invention.
Furthermore, the invention also provides a control device. In this control device embodiment, the control device comprises a processor and a storage device, the storage device may be configured to store a program for executing the method of the medical big data-based drug application analysis of the above-described method embodiment, and the processor may be configured to execute the program in the storage device, the program including but not limited to the program for executing the method of the medical big data-based drug application analysis of the above-described method embodiment. For convenience of explanation, only the parts related to the embodiments of the present invention are shown, and details of the specific techniques are not disclosed. The control device may be a control device apparatus formed by including various electronic devices, and optionally, the control device in the embodiment of the present invention is a server.
It will be understood by those skilled in the art that all or part of the flow of the method according to the above-described embodiment may be implemented by a computer program, which may be stored in a computer-readable storage medium and used to implement the steps of the above-described embodiments of the method when the computer program is executed by a processor. Wherein the computer program comprises computer program code, which may be in the form of source code, object code, an executable file or some intermediate form, etc. The computer-readable medium may include: any entity or device capable of carrying said computer program code, media, usb disk, removable hard disk, magnetic diskette, optical disk, computer memory, read-only memory, random access memory, electrical carrier wave signals, telecommunication signals, software distribution media, etc. It should be noted that the computer readable medium may contain content that is subject to appropriate increase or decrease as required by legislation and patent practice in jurisdictions, for example, in some jurisdictions, computer readable media does not include electrical carrier signals and telecommunications signals as is required by legislation and patent practice.
Further, it should be understood that, since the modules are only configured to illustrate the functional units of the system of the present invention, the corresponding physical devices of the modules may be the processor itself, or a part of software, a part of hardware, or a part of a combination of software and hardware in the processor. Thus, the number of individual modules in the figures is merely illustrative.
Those skilled in the art will appreciate that the various modules in the system may be adaptively split or combined. Such splitting or combining of specific modules does not cause the technical solutions to deviate from the principle of the present invention, and therefore, the technical solutions after splitting or combining will fall within the protection scope of the present invention.
So far, the technical solution of the present invention has been described with reference to one embodiment shown in the drawings, but it is easily understood by those skilled in the art that the scope of the present invention is obviously not limited to these specific embodiments. Equivalent changes or substitutions of related technical features can be made by those skilled in the art without departing from the principle of the invention, and the technical scheme after the changes or substitutions can fall into the protection scope of the invention.

Claims (10)

1. A method for drug application analysis based on medical big data, the method comprising:
acquiring internal data and external data and respectively processing the acquired data to form different types of tagged data;
classifying the attribute data of the one or more objects according to a classification model algorithm based on the tagged data of one or more different types and the attribute data of the one or more corresponding objects respectively to determine category labels of the attribute data, predicting recommendation probability of the attribute data of the one or more objects according to a neural network prediction model algorithm and outputting a prediction result, identifying an association relation between words for the attribute data of the one or more objects according to a vocabulary association analysis model algorithm, and generating abstract information for the attribute data of the one or more objects according to an abstract algorithm model;
and performing data merging processing after calculating through a business rule according to the determined category label, the prediction result of the recommendation probability, the association relation and the abstract information, and extracting the data to a corresponding medicine application state analysis end so as to respond to a user request and output a corresponding analysis result.
2. The method for analyzing drug application based on medical big data as claimed in claim 1, wherein the step of processing the collected data to form different types of labeled data respectively comprises:
acquiring an object corresponding to each label type in the acquired data according to a preset label type, and setting a corresponding object label for each object;
acquiring attribute data associated with each object in the acquired data;
and respectively setting the label of the attribute data associated with each object according to the object label corresponding to each object, and acquiring the labeling data corresponding to each label type according to the object corresponding to each label type, the object label of each object and the associated attribute data.
3. The method for drug application analysis based on medical big data according to claim 2,
the step of classifying the attribute data of the one or more objects according to a classification model algorithm to determine a class label of each attribute data specifically includes:
acquiring data samples in a preset training set, wherein the data samples comprise attribute samples and corresponding class labels; wherein the category label comprises a drug name and a corresponding one or more indications;
performing model training on a pre-constructed naive Bayes classifier based on the data sample by utilizing a machine learning algorithm;
classifying each attribute data by using a naive Bayes classifier after model training to obtain a class label corresponding to each attribute data;
and/or the like and/or,
the step of predicting the recommendation probability of each attribute data of the one or more objects according to the neural network prediction model algorithm and outputting the prediction result specifically includes:
analyzing a recommendation category and a recommendation category weight corresponding to each word in a preset corpus according to preset seed words and corresponding recommendation categories and recommendation category weights;
constructing a word vector model according to the word vector, the recommendation category and the recommendation category weight corresponding to each word;
obtaining a word vector and a recommended category weight corresponding to each word in the attribute data by using the word vector model, and obtaining a feature vector corresponding to the attribute data according to the word vector and the recommended category weight corresponding to each word;
predicting the probability of each recommended category corresponding to the attribute data by using an LSTM model according to the feature vector corresponding to the attribute data;
setting a recommendation label of the attribute data according to a recommendation category corresponding to the maximum probability;
and/or the like and/or,
the step of identifying the association relationship among the words for each attribute data of the one or more objects according to the vocabulary association analysis model algorithm specifically includes:
screening the attribute data according to preset screening conditions to obtain target attribute data;
performing word segmentation processing on the target attribute data, and performing semantic analysis on each word in the target attribute data according to a word segmentation processing result;
acquiring a first class of words corresponding to word semantics and class labels of the target attribute data and a second class of words corresponding to word semantics and recommended labels of the target attribute data according to semantic analysis results; the first class of words comprises first sub-class words with word semantics corresponding to names of the Chinese medicines in the category labels and/or second sub-class words with word semantics corresponding to indications in the category labels;
calculating formula according to correlation
Figure FDA0002522544370000021
Respectively calculating the correlation degree between each first class word and each second class word, wherein R isijIs the degree of correlation between the ith first class word and the jth second class word, Ni_and_jIs the number of times that the ith first-class word and the jth second-class word appear in a target attribute data at the same time, and N isi_or_jIs the sum of the times of occurrence of the ith first-type term and the jth second-type term in all the target attribute data;
and/or the like and/or,
the step of generating digest information for each attribute data of the one or more objects according to the digest algorithm model specifically includes: and generating corresponding abstract information of each attribute data by using an abstract algorithm model based on a TextRank algorithm.
4. The method for drug application analysis based on medical big data according to claim 3,
when the user request is a medicine value analysis, the step of performing data merging processing after calculating through a business rule according to the determined category label, the prediction result of the recommendation probability, the association relation and the abstract information, and extracting the data to a corresponding medicine application state analysis end so as to respond to the user request to output a corresponding analysis result specifically comprises the following steps:
acquiring attribute data corresponding to the target medicine name and taking the attribute data as attribute data to be analyzed according to the target medicine name in the user request and the medicine name in the category label corresponding to each attribute data;
acquiring attribute data to be analyzed with the same object label according to the object label corresponding to each attribute data to be analyzed to form an analysis data set corresponding to each object label;
acquiring the data type of each attribute data to be analyzed in an analysis data set of a current object label, acquiring the quantity of the attribute data to be analyzed with the same data type, acquiring a first weight corresponding to each data type based on the corresponding relation between a preset data type and the first weight, weighting and calculating according to the quantity corresponding to each data type and the first weight, and outputting the medicine value corresponding to the current object label according to the calculation result;
and/or the like and/or,
when the user request is a drug recommendation analysis, the step of performing data merging processing after calculating through a business rule according to the determined category label, the prediction result of the recommendation probability, the association relation and the abstract information, and extracting the data to a corresponding drug application state analysis end so as to respond to the user request to output a corresponding analysis result specifically comprises the following steps:
acquiring attribute data corresponding to the target medicine name and taking the attribute data as attribute data to be analyzed according to the target medicine name in the user request and the medicine name in the category label corresponding to each attribute data;
performing statement division on each attribute data to be analyzed, acquiring a first subclass word corresponding to the word semantics and the target medicine name in each attribute data to be analyzed, acquiring a first statement quantity containing the first subclass word in each attribute data to be analyzed according to a statement division result, and outputting a volume value of the target medicine name according to the first statement quantity;
acquiring an indication corresponding to the target medicine name according to the category label of each attribute data to be analyzed, acquiring a second subclass word corresponding to the indication and word semantics in each attribute data to be analyzed, acquiring a second sentence number containing the second subclass word in each attribute data to be analyzed according to a sentence dividing result, and outputting the sound volume value of the indication corresponding to the target medicine name according to the second sentence number;
according to the recommended label of each attribute data to be analyzed, obtaining a second word corresponding to the word semantics and the corresponding recommended label in each attribute data to be analyzed, obtaining a third sentence number containing the second word in each attribute data to be analyzed according to a sentence dividing result, and outputting the sound quantity value of the recommended label corresponding to the name of the target medicine according to the third sentence number;
and/or the presence of a gas in the gas,
setting a radius value corresponding to each first class term according to a reciprocal value of the degree of correlation corresponding to each first class term in each attribute data to be analyzed;
acquiring first-class words in all attribute data to be analyzed, and classifying and summarizing the first-class words according to whether the first-class words are the same words or not to obtain a word set corresponding to each first-class word;
acquiring radius values corresponding to all first-class words in each word set, calculating an average value, and setting the radius value of each first-class word corresponding to each word set according to the calculation result;
displaying the image by taking the second type of words as the circle center and the radius value corresponding to each first type of words; the pattern identification size corresponding to each first-class word depends on the sum of the times of occurrence of each first-class word and each second-class word in one target attribute data;
and/or the presence of a gas in the gas,
acquiring attribute data to be analyzed with the same type of recommendation label according to the recommendation label corresponding to each attribute data to be analyzed, and acquiring the quantity of the attribute data to be analyzed corresponding to each recommendation label according to the acquisition result;
and acquiring a second weight corresponding to each type of recommendation label based on a corresponding relation between a preset recommendation label type and the second weight, performing weighted sum according to the number corresponding to each type of recommendation label and the second weight, and outputting a recommendation value corresponding to the target drug name according to a calculation result.
5. A system for drug application analysis based on medical big data, the system comprising:
the first data processing device is configured to collect internal data and external data and respectively process the collected data to form different types of tagged data;
a second data processing device configured to classify respective attribute data of one or more objects according to a classification model algorithm to determine a category label of the respective attribute data based on one or more different types of the tagged data and the respective attribute data of the respectively corresponding one or more objects, predict a recommendation probability of the respective attribute data of the one or more objects according to a neural network prediction model algorithm and output a prediction result, identify an inter-word association relationship for the respective attribute data of the one or more objects according to a vocabulary association analysis model algorithm, and generate digest information for the respective attribute data of the one or more objects according to a digest algorithm model;
and the data analysis device is configured to perform data merging processing after calculation according to the determined category label, the prediction result of the recommendation probability, the association relation and the abstract information through a business rule, and extract data to a corresponding medicine application state analysis end so as to respond to a user request and output a corresponding analysis result.
6. The system for medicine application analysis based on big data of medicine according to claim 5, further comprising: the first data processing apparatus is configured to perform the following operations:
acquiring an object corresponding to each label type in the acquired data according to a preset label type, and setting a corresponding object label for each object;
acquiring attribute data associated with each object in the acquired data;
and respectively setting the label of the attribute data associated with each object according to the object label corresponding to each object, and acquiring the labeling data corresponding to each label type according to the object corresponding to each label type, the object label of each object and the associated attribute data.
7. The system for medicine application analysis based on medicine big data as claimed in claim 6, wherein the second data processing device comprises a category label obtaining module, and/or a recommendation label obtaining module, and/or an interword association relation recognition module, and/or a digest information generation module;
the category label acquisition module is configured to perform the following operations:
acquiring data samples in a preset training set, wherein the data samples comprise attribute samples and corresponding class labels; wherein the category label comprises a drug name and a corresponding one or more indications;
performing model training on a pre-constructed naive Bayes classifier based on the data sample by utilizing a machine learning algorithm;
classifying each attribute data by using a naive Bayes classifier after model training to obtain a class label corresponding to each attribute data;
the recommended label acquisition module is configured to perform the following operations:
analyzing a recommendation category and a recommendation category weight corresponding to each word in a preset corpus according to preset seed words and corresponding recommendation categories and recommendation category weights;
constructing a word vector model according to the word vector, the recommendation category and the recommendation category weight corresponding to each word;
obtaining a word vector and a recommended category weight corresponding to each word in the attribute data by using the word vector model, and obtaining a feature vector corresponding to the attribute data according to the word vector and the recommended category weight corresponding to each word;
predicting the probability of each recommended category corresponding to the attribute data by using an LSTM model according to the feature vector corresponding to the attribute data;
setting a recommendation label of the attribute data according to a recommendation category corresponding to the maximum probability;
the interword association recognition module is configured to perform the following operations:
screening the attribute data according to preset screening conditions to obtain target attribute data;
performing word segmentation processing on the target attribute data, and performing semantic analysis on each word in the target attribute data according to a word segmentation processing result;
acquiring a first class of words corresponding to word semantics and class labels of the target attribute data and a second class of words corresponding to word semantics and recommended labels of the target attribute data according to semantic analysis results; the first class of words comprises first sub-class words with word semantics corresponding to names of the Chinese medicines in the category labels and/or second sub-class words with word semantics corresponding to indications in the category labels;
calculating formula according to correlation
Figure FDA0002522544370000061
Respectively calculating the correlation degree between each first class word and each second class word, wherein R isijIs the degree of correlation between the ith first class word and the jth second class word, Ni_and_jIs the number of times that the ith first-class word and the jth second-class word appear in a target attribute data at the same time, and N isi_or_jIs the sum of the times of occurrence of the ith first-type term and the jth second-type term in all the target attribute data;
the abstract information generation module is configured to generate corresponding abstract information of each attribute data by using an abstract algorithm model based on a TextRank algorithm.
8. The system for medicine application analysis based on big data of medicine according to claim 7, wherein the data analysis device comprises a first data analysis device and/or a second data analysis device;
the first data analysis means is configured to perform the following operations when the user request is a drug value analysis:
acquiring attribute data corresponding to the target medicine name and taking the attribute data as attribute data to be analyzed according to the target medicine name in the user request and the medicine name in the category label corresponding to each attribute data;
acquiring attribute data to be analyzed with the same object label according to the object label corresponding to each attribute data to be analyzed to form an analysis data set corresponding to each object label;
acquiring the data type of each attribute data to be analyzed in an analysis data set of a current object label, acquiring the quantity of the attribute data to be analyzed with the same data type, acquiring a first weight corresponding to each data type based on the corresponding relation between a preset data type and the first weight, weighting and calculating according to the quantity corresponding to each data type and the first weight, and outputting the medicine value corresponding to the current object label according to the calculation result;
the second data analysis means is configured to perform the following operations when the user request is a drug recommendation analysis:
acquiring attribute data corresponding to the target medicine name and taking the attribute data as attribute data to be analyzed according to the target medicine name in the user request and the medicine name in the category label corresponding to each attribute data;
performing statement division on each attribute data to be analyzed, acquiring a first subclass word corresponding to the word semantics and the target medicine name in each attribute data to be analyzed, acquiring a first statement quantity containing the first subclass word in each attribute data to be analyzed according to a statement division result, and outputting a volume value of the target medicine name according to the first statement quantity;
acquiring an indication corresponding to the target medicine name according to the category label of each attribute data to be analyzed, acquiring a second subclass word corresponding to the indication and word semantics in each attribute data to be analyzed, acquiring a second sentence number containing the second subclass word in each attribute data to be analyzed according to a sentence dividing result, and outputting the sound volume value of the indication corresponding to the target medicine name according to the second sentence number;
according to the recommended label of each attribute data to be analyzed, obtaining a second word corresponding to the word semantics and the corresponding recommended label in each attribute data to be analyzed, obtaining a third sentence number containing the second word in each attribute data to be analyzed according to a sentence dividing result, and outputting the sound quantity value of the recommended label corresponding to the name of the target medicine according to the third sentence number;
and/or the presence of a gas in the gas,
setting a radius value corresponding to each first class term according to a reciprocal value of the degree of correlation corresponding to each first class term in each attribute data to be analyzed;
acquiring first-class words in all attribute data to be analyzed, and classifying and summarizing the first-class words according to whether the first-class words are the same words or not to obtain a word set corresponding to each first-class word;
acquiring radius values corresponding to all first-class words in each word set, calculating an average value, and setting the radius value of each first-class word corresponding to each word set according to the calculation result;
displaying the image by taking the second type of words as the circle center and the radius value corresponding to each first type of words; the pattern identification size corresponding to each first-class word depends on the sum of the times of occurrence of each first-class word and each second-class word in one target attribute data;
and/or the presence of a gas in the gas,
acquiring attribute data to be analyzed with the same type of recommendation label according to the recommendation label corresponding to each attribute data to be analyzed, and acquiring the quantity of the attribute data to be analyzed corresponding to each recommendation label according to the acquisition result;
and acquiring a second weight corresponding to each type of recommendation label based on a corresponding relation between a preset recommendation label type and the second weight, performing weighted sum according to the number corresponding to each type of recommendation label and the second weight, and outputting a recommendation value corresponding to the target drug name according to a calculation result.
9. A storage device having a plurality of program codes stored therein, wherein the program codes are adapted to be loaded and run by a processor to perform the method for medical big data based drug application analysis of any of claims 1 to 4.
10. A control device comprising a processor and a storage device adapted to store a plurality of program codes, characterized in that the program codes are adapted to be loaded and run by the processor to perform the method of medical big data based drug application analysis of any of claims 1 to 4.
CN202010495118.1A 2020-06-03 2020-06-03 Medicine application analysis method, system and device based on medicine big data Active CN111681775B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010495118.1A CN111681775B (en) 2020-06-03 2020-06-03 Medicine application analysis method, system and device based on medicine big data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010495118.1A CN111681775B (en) 2020-06-03 2020-06-03 Medicine application analysis method, system and device based on medicine big data

Publications (2)

Publication Number Publication Date
CN111681775A true CN111681775A (en) 2020-09-18
CN111681775B CN111681775B (en) 2023-09-29

Family

ID=72453886

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010495118.1A Active CN111681775B (en) 2020-06-03 2020-06-03 Medicine application analysis method, system and device based on medicine big data

Country Status (1)

Country Link
CN (1) CN111681775B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115831312A (en) * 2022-11-24 2023-03-21 上海市精神卫生中心(上海市心理咨询培训中心) Medication abnormality identification method and system

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106776711A (en) * 2016-11-14 2017-05-31 浙江大学 A kind of Chinese medical knowledge mapping construction method based on deep learning
CN107357933A (en) * 2017-08-04 2017-11-17 刘应波 A kind of label for multi-source heterogeneous science and technology information resource describes method and apparatus
US10146751B1 (en) * 2014-12-31 2018-12-04 Guangsheng Zhang Methods for information extraction, search, and structured representation of text data
CN109920508A (en) * 2018-12-28 2019-06-21 安徽省立医院 prescription auditing method and system
CN110119775A (en) * 2019-05-08 2019-08-13 腾讯科技(深圳)有限公司 Medical data processing method, device, system, equipment and storage medium
CN111177129A (en) * 2019-12-16 2020-05-19 中国平安财产保险股份有限公司 Label system construction method, device, equipment and storage medium

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10146751B1 (en) * 2014-12-31 2018-12-04 Guangsheng Zhang Methods for information extraction, search, and structured representation of text data
CN106776711A (en) * 2016-11-14 2017-05-31 浙江大学 A kind of Chinese medical knowledge mapping construction method based on deep learning
CN107357933A (en) * 2017-08-04 2017-11-17 刘应波 A kind of label for multi-source heterogeneous science and technology information resource describes method and apparatus
CN109920508A (en) * 2018-12-28 2019-06-21 安徽省立医院 prescription auditing method and system
CN110119775A (en) * 2019-05-08 2019-08-13 腾讯科技(深圳)有限公司 Medical data processing method, device, system, equipment and storage medium
CN111177129A (en) * 2019-12-16 2020-05-19 中国平安财产保险股份有限公司 Label system construction method, device, equipment and storage medium

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115831312A (en) * 2022-11-24 2023-03-21 上海市精神卫生中心(上海市心理咨询培训中心) Medication abnormality identification method and system

Also Published As

Publication number Publication date
CN111681775B (en) 2023-09-29

Similar Documents

Publication Publication Date Title
US20230101445A1 (en) Semantic Classification of Numerical Data in Natural Language Context Based on Machine Learning
Saloot et al. Hadith data mining and classification: a comparative analysis
CN105260437B (en) Text classification feature selection approach and its application in biological medicine text classification
KR101377114B1 (en) News snippet generation system and method for generating news snippet
Barhoom et al. Sarcasm Detection in Headline News using Machine and Deep Learning Algorithms
Kumar et al. A deep learning approaches and fastai text classification to predict 25 medical diseases from medical speech utterances, transcription and intent
US20150006531A1 (en) System and Method for Creating Labels for Clusters
CN112035757A (en) Medical waterfall flow pushing method, device, equipment and storage medium
Hsu et al. Multi-label classification of ICD coding using deep learning
Orosoo et al. Performance analysis of a novel hybrid deep learning approach in classification of quality-related English text
Moroney et al. The case for latent variable vs deep learning methods in misinformation detection: An application to covid-19
Sorour et al. AFND: Arabic fake news detection with an ensemble deep CNN-LSTM model
CN111681775A (en) Medicine application analysis method, system and device based on medicine big data
BE1027433A9 (en) A method of extracting information from semi-structured documents, an associated system and a processing device
Siddiqui et al. A Comprehensive Review on Text Classification and Text Mining Techniques Using Spam Dataset Detection
CN111681776B (en) Medical object relation analysis method and system based on medical big data
CN112561714B (en) Nuclear protection risk prediction method and device based on NLP technology and related equipment
Noh et al. Document retrieval for biomedical question answering with neural sentence matching
Segev et al. Context recognition using internet as a knowledge base
KR20220132679A (en) Clinical information search system and method using structure information of natural language
Bhatia et al. An efficient modular framework for automatic LIONC classification of MedIMG using unified medical language
Miran et al. Hate Speech Detection in Social Media (Twitter) Using Neural Network.
Shabbeer et al. Prediction of Sudden Health Crises Owing to Congestive Heart Failure with Deep Learning Models.
Jaculine Priya et al. Machine Learning for Information Extraction, Data Analysis and Predictions in the Healthcare System
US20240004910A1 (en) Systems and methods for systematic literature review

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant