CN108984699A - Merge the drug poisonous substance adverse reaction intelligent answer method of multichannel text feature - Google Patents
Merge the drug poisonous substance adverse reaction intelligent answer method of multichannel text feature Download PDFInfo
- Publication number
- CN108984699A CN108984699A CN201810728746.2A CN201810728746A CN108984699A CN 108984699 A CN108984699 A CN 108984699A CN 201810728746 A CN201810728746 A CN 201810728746A CN 108984699 A CN108984699 A CN 108984699A
- Authority
- CN
- China
- Prior art keywords
- entry
- concept
- drug
- feature
- text
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/289—Phrasal analysis, e.g. finite state techniques or chunking
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/30—Semantic analysis
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
A kind of drug poisonous substance adverse reaction intelligent answer method merging multichannel text feature, it is a kind of using natural language text processing technique, utilize different channel text features, it include bottom distribution characteristics, entry entry features, entry tag along sort feature, directory feature, characteristic present and formalization concept are carried out the strategy of fusion treatment, it realizes quick, efficient drug poisonous substance adverse reaction compares the algorithm of mapping, it includes the following steps, formalize the building of concept, the building of multichannel term vector, the mapping of multi-channel feature vector and formal notion, using random forest decision tree, decision is compared to drug poisonous substance adverse reaction.Drug poisonous substance adverse reaction of the invention compares mapping method, and the hardware device used is mobile phone, firstly, patient provides adverse reaction symptom information;Then, algorithm carries out the analysis of text semantic, realizes the comparison and screening of quick drug poisonous substance, provides adverse reaction treatment advice regulation.
Description
Technical field
The present invention relates to a kind of drug poisonous substance adverse reaction intelligent answer methods for merging multichannel text feature.
Background technique
Drug and poison type are various, because adverse reaction occur relatively conventional for the factors such as finishing, clothes, food, drug, and
Daily or clinical adverse shows also variant because of differences such as constitution, weight, sources, and main suit's linguistic norm is bad, form
It is different, how to assist people daily or the clinical realization of doctor quickly, intelligence drug, poisonous substance it is qualitative comparison with type belong to, contracting
The investigation range of small adverse reaction all has preferable aid decision effect and practical valence for clinical diagnosis and treatment and prevention and health care
Value.
At present in life adverse reaction unknown poisonous substance screening, there is problems: (1) low drug, poison in patient body
The difficulty of object increasing concentrations analysis;(2) poisoning symptom is difficult to differentiate between with disease, and the active of toxic patient is needed to associate cooperation;(3)
The timeliness of detection method;(4) drug poisonous substance detection profession, the time of detection is higher with human cost, is easy to miss slight anti-
Seasonable processing opportunity.It is currently based on the text handling method of statistical analysis and deep learning fitting, to high quality, extensive neck
Domain corpus is more demanding, and single medicine poisonous substance adverse reaction case data is very limited, and the opening in face of non-standard is bad
The problem of reaction is described with case, and drug poisonous substance adverse reaction characteristic processing then easily falls into text feature sparsity, limits
The service efficiency of drug poisonous substance case.
Summary of the invention
It is an object of the invention to provide a kind of accuracy height, good reliability, fusion multichannels practical, at low cost
The drug poisonous substance adverse reaction intelligent answer method of text feature.
The drug poisonous substance adverse reaction intelligent answer method of fusion multichannel text feature of the invention is a kind of using certainly
Right language text processing technique includes bottom distribution characteristics, entry entry features, entry using different channel text features
Tag along sort feature, directory feature carry out characteristic present and formalization concept the strategy of fusion treatment, realize quickly, efficiently
Drug poisonous substance adverse reaction compare mapping algorithm comprising following four step:
1, it formalizes the building of concept: obtaining open and drug poisonous substance related text corpus, parsing obtains corresponding relationship
Formal Context and formalization concept, the specific steps are as follows:
(1.1) relation form background: defining field concept background version, i.e., using relationship as the relation form background of core,
Relation form background is defined as ternary relation group set a K, K=(G, M, RI), and wherein G is the set of entry object;M is mark
Infuse the set of entry;The set of multivalue entity associated of its codomain of RI between G, M, g is in the case where being associated with ri for (g, m, ri) ∈ K expression,
With m value, relation form background is simply denoted as K.Multivalue association RI refers to the association on all forms in definition, can be general
Is-a relationship, component relationship, positional relationship, the existing association of causality or specific area or attribute, such as research belongs to
Property, detailed outline attribute, birth attribute, or even non-name association etc., RI can be it is apparent it is single, can also be Fuzzy Compound.
(1.2) relation form concept: definition is defined as follows using relationship as the formalization concept of core:
At relation form background K=(G, M, RI), for setIn the presence of:
(a) f is mapped1: G0→ RI, is denoted as
MappingRI0→ G, is denoted as
(b) f is mapped2: M0→ RI, is denoted as
MappingRI0→ M, is denoted as
If meeting condition respectively between setThen two
Tuple (G0, RI0) it is referred to as the main concept generated under relation form background K, main concept set is denoted as SC=(G0, RI0);
Binary group (M0, RI0) it is known as object concept, object concept set is denoted as OC=(M0, RI0), both concepts are referred to as below
Relation form concept RC is denoted as RC=SC ∪ OC.
(1.3) it is based on the definition of step (1.1), (1.2), from open collaboration data library, such as Chinese wikipedia, Baidu
Encyclopaedia, drug poisonous substance field text etc. obtain entry relation form background and relation form concept.
(1.4) (1.3) iteration expansion relation Formal Context and relation form concept are repeated, until relation form concept lattice is advised
Mould reaches pre-determined size, and formalization concept initialization terminates.
2, multichannel term vector constructs: being mark basis with above-mentioned relation form concept, to text entry multi-channel feature
It is trained, trains thinking, using classical Skip-gram model thinking, the specific step of different channel text feature processing
It is rapid as follows:
(2.1) text syntactic distribution feature: being based on parent drugs poisonous substance, entry versatility corpus, and it is pre- to carry out simple participle
Processing;
(2.2) processing of entry directory feature: the directory information of entry features the semanteme of entry from certain angle,
For algorithm using entry as entry, catalogue entry is that context entry carries out the training of neural network to entry-directory information,
Obtain feature vector;
(2.3) processing of entry label characteristics: entry label is the varigrained classification information of entry semanteme.Algorithm
Using entry as entry, label entry is that context entry carries out the training of neural network to entry-label information, is obtained
Feature vector;
(2.4) processing of entry entry features: entry entry is entry attribute information.Algorithm using entry as entry,
Entry entry is context entry, to entry-entry information, carries out the training of neural network, obtains feature vector;
The extraction of features above does not account for the complete logical of concept, only owns from different angles to entry
Semanteme is characterized, and entry feature differentiation is good, but interpretation is bad.
3, the mapping of multi-channel feature vector and formal notion: concept regards the combination and reference of feature, i.e., different spies as
Sign combination forms different concepts, and different concepts have referred to the set of a certain feature, in order to portray field text concept, need to build
The mapping of vertical feature vector and formal notion, the specific steps are as follows:
The feature vector and relation form concept lattice that different channels are obtained using front, using random forest integrated study
Method is trained the mapping relations of feature and concept;
(3.1) using more relation form concepts as label, each entry only with a certain label carry out maximum entropy calculating, complete to
Measure the determination of component split values;
(3.2) if there is multiple concepts tabs, its corresponding maximum information gain is identical, then currently with few general of intension
Thought is divided;
(3.3) it repeats the above process, until the small Mr. Yu's threshold value of number of tags of each subset.
4, finally, using random forest decision tree, decision is compared to drug poisonous substance adverse reaction, the step of the process
It is as follows:
(4.1) drug poisonous substance had both deposited the processing of case text, obtained the entry vector and relation form concept of each case
Collection;
(4.2) the plain text pretreatment of adverse reaction text description, obtains its feature as much as possible for corresponding to entry
Vector;
(4.3) based on random forest, the categorised decision of feature vector is carried out, calculates its vector and relation form concept
Similarity;
(4.4) it is realized not based on multi-channel feature vector and relation form concept in entry and two semantic hierarchies of concept
Good reaction and the intelligence of drug reading matter compare;
(4.5) recommend multiple candidate suspected drug poisonous substances and its emergence treatment scheme.
Drug poisonous substance adverse reaction of the invention compares mapping method, and the hardware device used is mobile phone, firstly, patient mentions
For adverse reaction symptom information;Then, algorithm carries out the analysis of text semantic, realizes comparison and the sieve of quick drug poisonous substance
Choosing, provides adverse reaction treatment advice regulation.
The drug poisonous substance adverse reaction intelligent answer method and prior art phase of fusion multichannel text feature of the invention
Than having the following advantages:
1, the scale of corpus of text and mark quality requirement be not high, reduces the influence of Feature Engineering, and the acquisition of feature is logical
Road more fully, can more preferably alleviate the sparse problem of feature;
2, on the basis of term vector characteristic present, more structuring concept logics are introduced, make its semantic meaning representation form,
Quantitative computational and qualitative interpretation is taken into account.
3, due to introducing more semantic backgrounds, to the better adaptability of the opening text description of different background user;
4, the process that this method compares is not necessarily to manual intervention, and entire semanteme comparison process is automatically finished, shallow better than previous
Level retrieval and the great Ontology engineering of workload;
5, this method realizes that software and hardware is simple and reliable using mobile device, easy to use, and there is drug poisonous substance to compare other side
Just, low in cost, screening, the reply advantages such as timeliness is good.
Detailed description of the invention
Fig. 1 is system structure composition block diagram of the invention.
Specific embodiment
A kind of drug poisonous substance adverse reaction intelligent answer method merging multichannel text feature, used system include
Mobile phone, text semantic characteristic processing software and user.User inputs adverse reaction symptom, and method automatically compares drug poisonous substance and ties
Fruit and treating method are presented in front of the user by mobile device, it is simple fast, timeliness it is good.
Hardware requirement is as follows in system: mobile phone uses 655 processor of Kirin, and memory 4G or more, at least 2G are stored above
Space.Software requirement in system is as follows: Android7.0, Software Development Platform java.The above-mentioned minimalist configuration the case where
Under, it is proposed that text describes length and does not exceed 1000 words.
Drug poisonous substance comparison method relies primarily on the Formal Context information of entry distribution characteristics with the open knowledge base that cooperates, and knows
The limitation for knowing engineering is smaller, and considers multichannel text feature, in open application environment, there is better adaptability to set
Meter can construct the small assistant of drug poisonous substance knowledge intelligent for people and provide core technology support.
Claims (1)
1. a kind of drug poisonous substance adverse reaction intelligent answer method for merging multichannel text feature, it is characterised in that: it includes
Following steps:
(1), it formalizes the building of concept: obtaining open and drug poisonous substance related text corpus, parsing obtains corresponding relationship shape
Formula background and formalization concept, the specific steps are as follows:
(1.1) relation form background: field concept background version is defined, i.e., using relationship as the relation form background of core, relationship
Formal Context is defined as ternary relation group set a K, K=(G, M, RI), and wherein G is the set of entry object;M is mark word
The set of item;The set of multivalue entity associated of its codomain of RI between G, M, (g, m, ri) ∈ K indicate that g in the case where being associated with ri, has m
Value, relation form background are simply denoted as K;
(1.2) relation form concept: definition is defined as follows using relationship as the formalization concept of core:
At relation form background K=(G, M, RI), for setIn the presence of:
(a) f is mapped1: G0→ RI, is denoted as
MappingRI0→ G, is denoted as
(b) f is mapped2: M0→ RI, is denoted as
MappingRI0→ M, is denoted as
If meeting condition respectively between setThen binary group
(G0, RI0) it is referred to as the main concept generated under relation form background K, main concept set is denoted as SC=(G0, RI0);Binary
Group (M0, RI0) it is known as object concept, object concept set is denoted as OC=(M0, RI0), both concepts are referred to as relationship below
Formal notion RC is denoted as RC=SC ∪ OC;
(1.3) be based on the definition of step (1.1), (1.2), from open collaboration data library, obtain entry relation form background with
And relation form concept;
(1.4) (1.3) iteration expansion relation Formal Context and relation form concept are repeated, until relation form concept lattice scale reaches
To pre-determined size, formalizing concept initialization terminates;
(2), multichannel term vector construct: with above-mentioned relation form concept be mark basis, to text entry multi-channel feature into
Row training, trains thinking, using classical Skip-gram model thinking, the specific steps of different channel text feature processing
It is as follows:
(2.1) text syntactic distribution feature: being based on parent drugs poisonous substance, entry versatility corpus, carries out simple participle pretreatment;
(2.2) processing of entry directory feature: the directory information of entry features the semanteme of entry, algorithm from certain angle
Using entry as entry, catalogue entry is that context entry carries out the training of neural network to entry-directory information, is obtained
Feature vector;
(2.3) processing of entry label characteristics: entry label is the varigrained classification information of entry semanteme.Algorithm is with word
Item is entry, and label entry is context entry, to entry-label information, carries out the training of neural network, obtains feature
Vector;
(2.4) processing of entry entry features: entry entry is entry attribute information.Algorithm is using entry as entry, entry
Entry is context entry, to entry-entry information, carries out the training of neural network, obtains feature vector;
(3), the mapping of multi-channel feature vector and formal notion: concept regards the combination and reference of feature, i.e., different features as
Combination forms different concepts, and different concepts have referred to the set of a certain feature, in order to portray field text concept, need to establish
The mapping of feature vector and formal notion, the specific steps are as follows:
(3.1) using more relation form concepts as label, each entry only carries out maximum entropy calculating with a certain label, completes vector point
Measure the determination of split values;
(3.2) if there is multiple concepts tabs, its corresponding maximum information gain is identical, then currently with the few concept of intension into
Line splitting;
(3.3) it repeats the above process, until the small Mr. Yu's threshold value of number of tags of each subset;
(4), using random forest decision tree, decision is compared to drug poisonous substance adverse reaction, the step of process is as follows:
(4.1) drug poisonous substance had both deposited the processing of case text, obtained the entry vector and relation form concept set of each case;
(4.2) the plain text pretreatment of adverse reaction text description, obtains its feature vector as much as possible for corresponding to entry;
(4.3) based on random forest, the categorised decision of feature vector is carried out, calculates the phase of its vector with relation form concept
Like degree;
(4.4) it is realized bad anti-based on multi-channel feature vector and relation form concept in entry and two semantic hierarchies of concept
It should be compared with the intelligence of drug reading matter;
(4.5) recommend multiple candidate suspected drug poisonous substances and its emergence treatment scheme.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810728746.2A CN108984699A (en) | 2018-07-05 | 2018-07-05 | Merge the drug poisonous substance adverse reaction intelligent answer method of multichannel text feature |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810728746.2A CN108984699A (en) | 2018-07-05 | 2018-07-05 | Merge the drug poisonous substance adverse reaction intelligent answer method of multichannel text feature |
Publications (1)
Publication Number | Publication Date |
---|---|
CN108984699A true CN108984699A (en) | 2018-12-11 |
Family
ID=64537085
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810728746.2A Pending CN108984699A (en) | 2018-07-05 | 2018-07-05 | Merge the drug poisonous substance adverse reaction intelligent answer method of multichannel text feature |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108984699A (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109767817A (en) * | 2019-01-16 | 2019-05-17 | 南通大学 | A kind of drug potential adverse effect discovery method based on neural network language model |
CN115577699A (en) * | 2022-12-09 | 2023-01-06 | 杭州北冥星眸科技有限公司 | Method for determining reasonability of text item, electronic equipment and storage medium |
CN116504331A (en) * | 2023-04-28 | 2023-07-28 | 东北林业大学 | Frequency score prediction method for drug side effects based on multiple modes and multiple tasks |
-
2018
- 2018-07-05 CN CN201810728746.2A patent/CN108984699A/en active Pending
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109767817A (en) * | 2019-01-16 | 2019-05-17 | 南通大学 | A kind of drug potential adverse effect discovery method based on neural network language model |
CN115577699A (en) * | 2022-12-09 | 2023-01-06 | 杭州北冥星眸科技有限公司 | Method for determining reasonability of text item, electronic equipment and storage medium |
CN115577699B (en) * | 2022-12-09 | 2023-04-14 | 杭州北冥星眸科技有限公司 | Method for determining text entry reasonableness, electronic equipment and storage medium |
CN116504331A (en) * | 2023-04-28 | 2023-07-28 | 东北林业大学 | Frequency score prediction method for drug side effects based on multiple modes and multiple tasks |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Syed et al. | Full-text or abstract? examining topic coherence scores using latent dirichlet allocation | |
Poux et al. | Voxel-based 3D point cloud semantic segmentation: Unsupervised geometric and relationship featuring vs deep learning methods | |
CN111414393B (en) | Semantic similar case retrieval method and equipment based on medical knowledge graph | |
CN106682411B (en) | A method of disease label is converted by physical examination diagnostic data | |
Ali et al. | Prediction of Diseases in Smart Health Care System using Machine Learning | |
Castellano et al. | Leveraging knowledge graphs and deep learning for automatic art analysis | |
Zubi et al. | Using some data mining techniques for early diagnosis of lung cancer | |
CN108984699A (en) | Merge the drug poisonous substance adverse reaction intelligent answer method of multichannel text feature | |
Abgaz et al. | A methodology for semantic enrichment of cultural heritage images using artificial intelligence technologies | |
Role et al. | Beyond cluster labeling: Semantic interpretation of clusters’ contents using a graph representation | |
Rong et al. | Deriving external forces via convolutional neural networks for biomedical image segmentation | |
Dessì et al. | A recommender system of medical reports leveraging cognitive computing and frame semantics | |
CN114003734A (en) | Breast cancer risk factor knowledge system model, knowledge map system and construction method | |
Kiyasseh et al. | CROCS: clustering and retrieval of cardiac signals based on patient disease class, sex, and age | |
CN116304114B (en) | Intelligent data processing method and system based on surgical nursing | |
Bijari et al. | Assisted neuroscience knowledge extraction via machine learning applied to neural reconstruction metadata on NeuroMorpho. Org | |
Hassanpour et al. | Clustering rule bases using ontology-based similarity measures | |
Krishna et al. | Automated image annotation for semantic indexing and retrieval of medical images | |
CN111460173A (en) | Method for constructing disease ontology model of thyroid cancer | |
Abu et al. | Biodiversity image retrieval framework for monogeneans | |
Wang et al. | Application of data mining technology in medical image processing | |
Aggoune et al. | Big data integration: A semantic mediation architecture using summary | |
Rajasekar et al. | Machine learning algorithm for information extraction from gynaecological domain in Tamil | |
Han | On the power of big data: Mining structures from massive, unstructured text data. | |
Pham et al. | MeKG: Building a medical knowledge graph by data mining from MEDLINE |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
WD01 | Invention patent application deemed withdrawn after publication |
Application publication date: 20181211 |
|
WD01 | Invention patent application deemed withdrawn after publication |