CN108984699A - Merge the drug poisonous substance adverse reaction intelligent answer method of multichannel text feature - Google Patents

Merge the drug poisonous substance adverse reaction intelligent answer method of multichannel text feature Download PDF

Info

Publication number
CN108984699A
CN108984699A CN201810728746.2A CN201810728746A CN108984699A CN 108984699 A CN108984699 A CN 108984699A CN 201810728746 A CN201810728746 A CN 201810728746A CN 108984699 A CN108984699 A CN 108984699A
Authority
CN
China
Prior art keywords
entry
concept
drug
feature
text
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201810728746.2A
Other languages
Chinese (zh)
Inventor
程春雷
胡晓镭
杜建强
雷杰言
徐文达
朱彦陈
李智彪
赵辉
叶云
卢元元
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jiangxi University of Traditional Chinese Medicine
Original Assignee
Jiangxi University of Traditional Chinese Medicine
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jiangxi University of Traditional Chinese Medicine filed Critical Jiangxi University of Traditional Chinese Medicine
Priority to CN201810728746.2A priority Critical patent/CN108984699A/en
Publication of CN108984699A publication Critical patent/CN108984699A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

A kind of drug poisonous substance adverse reaction intelligent answer method merging multichannel text feature, it is a kind of using natural language text processing technique, utilize different channel text features, it include bottom distribution characteristics, entry entry features, entry tag along sort feature, directory feature, characteristic present and formalization concept are carried out the strategy of fusion treatment, it realizes quick, efficient drug poisonous substance adverse reaction compares the algorithm of mapping, it includes the following steps, formalize the building of concept, the building of multichannel term vector, the mapping of multi-channel feature vector and formal notion, using random forest decision tree, decision is compared to drug poisonous substance adverse reaction.Drug poisonous substance adverse reaction of the invention compares mapping method, and the hardware device used is mobile phone, firstly, patient provides adverse reaction symptom information;Then, algorithm carries out the analysis of text semantic, realizes the comparison and screening of quick drug poisonous substance, provides adverse reaction treatment advice regulation.

Description

Merge the drug poisonous substance adverse reaction intelligent answer method of multichannel text feature
Technical field
The present invention relates to a kind of drug poisonous substance adverse reaction intelligent answer methods for merging multichannel text feature.
Background technique
Drug and poison type are various, because adverse reaction occur relatively conventional for the factors such as finishing, clothes, food, drug, and Daily or clinical adverse shows also variant because of differences such as constitution, weight, sources, and main suit's linguistic norm is bad, form It is different, how to assist people daily or the clinical realization of doctor quickly, intelligence drug, poisonous substance it is qualitative comparison with type belong to, contracting The investigation range of small adverse reaction all has preferable aid decision effect and practical valence for clinical diagnosis and treatment and prevention and health care Value.
At present in life adverse reaction unknown poisonous substance screening, there is problems: (1) low drug, poison in patient body The difficulty of object increasing concentrations analysis;(2) poisoning symptom is difficult to differentiate between with disease, and the active of toxic patient is needed to associate cooperation;(3) The timeliness of detection method;(4) drug poisonous substance detection profession, the time of detection is higher with human cost, is easy to miss slight anti- Seasonable processing opportunity.It is currently based on the text handling method of statistical analysis and deep learning fitting, to high quality, extensive neck Domain corpus is more demanding, and single medicine poisonous substance adverse reaction case data is very limited, and the opening in face of non-standard is bad The problem of reaction is described with case, and drug poisonous substance adverse reaction characteristic processing then easily falls into text feature sparsity, limits The service efficiency of drug poisonous substance case.
Summary of the invention
It is an object of the invention to provide a kind of accuracy height, good reliability, fusion multichannels practical, at low cost The drug poisonous substance adverse reaction intelligent answer method of text feature.
The drug poisonous substance adverse reaction intelligent answer method of fusion multichannel text feature of the invention is a kind of using certainly Right language text processing technique includes bottom distribution characteristics, entry entry features, entry using different channel text features Tag along sort feature, directory feature carry out characteristic present and formalization concept the strategy of fusion treatment, realize quickly, efficiently Drug poisonous substance adverse reaction compare mapping algorithm comprising following four step:
1, it formalizes the building of concept: obtaining open and drug poisonous substance related text corpus, parsing obtains corresponding relationship Formal Context and formalization concept, the specific steps are as follows:
(1.1) relation form background: defining field concept background version, i.e., using relationship as the relation form background of core, Relation form background is defined as ternary relation group set a K, K=(G, M, RI), and wherein G is the set of entry object;M is mark Infuse the set of entry;The set of multivalue entity associated of its codomain of RI between G, M, g is in the case where being associated with ri for (g, m, ri) ∈ K expression, With m value, relation form background is simply denoted as K.Multivalue association RI refers to the association on all forms in definition, can be general Is-a relationship, component relationship, positional relationship, the existing association of causality or specific area or attribute, such as research belongs to Property, detailed outline attribute, birth attribute, or even non-name association etc., RI can be it is apparent it is single, can also be Fuzzy Compound.
(1.2) relation form concept: definition is defined as follows using relationship as the formalization concept of core:
At relation form background K=(G, M, RI), for setIn the presence of:
(a) f is mapped1: G0→ RI, is denoted as
MappingRI0→ G, is denoted as
(b) f is mapped2: M0→ RI, is denoted as
MappingRI0→ M, is denoted as
If meeting condition respectively between setThen two Tuple (G0, RI0) it is referred to as the main concept generated under relation form background K, main concept set is denoted as SC=(G0, RI0); Binary group (M0, RI0) it is known as object concept, object concept set is denoted as OC=(M0, RI0), both concepts are referred to as below Relation form concept RC is denoted as RC=SC ∪ OC.
(1.3) it is based on the definition of step (1.1), (1.2), from open collaboration data library, such as Chinese wikipedia, Baidu Encyclopaedia, drug poisonous substance field text etc. obtain entry relation form background and relation form concept.
(1.4) (1.3) iteration expansion relation Formal Context and relation form concept are repeated, until relation form concept lattice is advised Mould reaches pre-determined size, and formalization concept initialization terminates.
2, multichannel term vector constructs: being mark basis with above-mentioned relation form concept, to text entry multi-channel feature It is trained, trains thinking, using classical Skip-gram model thinking, the specific step of different channel text feature processing It is rapid as follows:
(2.1) text syntactic distribution feature: being based on parent drugs poisonous substance, entry versatility corpus, and it is pre- to carry out simple participle Processing;
(2.2) processing of entry directory feature: the directory information of entry features the semanteme of entry from certain angle, For algorithm using entry as entry, catalogue entry is that context entry carries out the training of neural network to entry-directory information, Obtain feature vector;
(2.3) processing of entry label characteristics: entry label is the varigrained classification information of entry semanteme.Algorithm Using entry as entry, label entry is that context entry carries out the training of neural network to entry-label information, is obtained Feature vector;
(2.4) processing of entry entry features: entry entry is entry attribute information.Algorithm using entry as entry, Entry entry is context entry, to entry-entry information, carries out the training of neural network, obtains feature vector;
The extraction of features above does not account for the complete logical of concept, only owns from different angles to entry Semanteme is characterized, and entry feature differentiation is good, but interpretation is bad.
3, the mapping of multi-channel feature vector and formal notion: concept regards the combination and reference of feature, i.e., different spies as Sign combination forms different concepts, and different concepts have referred to the set of a certain feature, in order to portray field text concept, need to build The mapping of vertical feature vector and formal notion, the specific steps are as follows:
The feature vector and relation form concept lattice that different channels are obtained using front, using random forest integrated study Method is trained the mapping relations of feature and concept;
(3.1) using more relation form concepts as label, each entry only with a certain label carry out maximum entropy calculating, complete to Measure the determination of component split values;
(3.2) if there is multiple concepts tabs, its corresponding maximum information gain is identical, then currently with few general of intension Thought is divided;
(3.3) it repeats the above process, until the small Mr. Yu's threshold value of number of tags of each subset.
4, finally, using random forest decision tree, decision is compared to drug poisonous substance adverse reaction, the step of the process It is as follows:
(4.1) drug poisonous substance had both deposited the processing of case text, obtained the entry vector and relation form concept of each case Collection;
(4.2) the plain text pretreatment of adverse reaction text description, obtains its feature as much as possible for corresponding to entry Vector;
(4.3) based on random forest, the categorised decision of feature vector is carried out, calculates its vector and relation form concept Similarity;
(4.4) it is realized not based on multi-channel feature vector and relation form concept in entry and two semantic hierarchies of concept Good reaction and the intelligence of drug reading matter compare;
(4.5) recommend multiple candidate suspected drug poisonous substances and its emergence treatment scheme.
Drug poisonous substance adverse reaction of the invention compares mapping method, and the hardware device used is mobile phone, firstly, patient mentions For adverse reaction symptom information;Then, algorithm carries out the analysis of text semantic, realizes comparison and the sieve of quick drug poisonous substance Choosing, provides adverse reaction treatment advice regulation.
The drug poisonous substance adverse reaction intelligent answer method and prior art phase of fusion multichannel text feature of the invention Than having the following advantages:
1, the scale of corpus of text and mark quality requirement be not high, reduces the influence of Feature Engineering, and the acquisition of feature is logical Road more fully, can more preferably alleviate the sparse problem of feature;
2, on the basis of term vector characteristic present, more structuring concept logics are introduced, make its semantic meaning representation form, Quantitative computational and qualitative interpretation is taken into account.
3, due to introducing more semantic backgrounds, to the better adaptability of the opening text description of different background user;
4, the process that this method compares is not necessarily to manual intervention, and entire semanteme comparison process is automatically finished, shallow better than previous Level retrieval and the great Ontology engineering of workload;
5, this method realizes that software and hardware is simple and reliable using mobile device, easy to use, and there is drug poisonous substance to compare other side Just, low in cost, screening, the reply advantages such as timeliness is good.
Detailed description of the invention
Fig. 1 is system structure composition block diagram of the invention.
Specific embodiment
A kind of drug poisonous substance adverse reaction intelligent answer method merging multichannel text feature, used system include Mobile phone, text semantic characteristic processing software and user.User inputs adverse reaction symptom, and method automatically compares drug poisonous substance and ties Fruit and treating method are presented in front of the user by mobile device, it is simple fast, timeliness it is good.
Hardware requirement is as follows in system: mobile phone uses 655 processor of Kirin, and memory 4G or more, at least 2G are stored above Space.Software requirement in system is as follows: Android7.0, Software Development Platform java.The above-mentioned minimalist configuration the case where Under, it is proposed that text describes length and does not exceed 1000 words.
Drug poisonous substance comparison method relies primarily on the Formal Context information of entry distribution characteristics with the open knowledge base that cooperates, and knows The limitation for knowing engineering is smaller, and considers multichannel text feature, in open application environment, there is better adaptability to set Meter can construct the small assistant of drug poisonous substance knowledge intelligent for people and provide core technology support.

Claims (1)

1. a kind of drug poisonous substance adverse reaction intelligent answer method for merging multichannel text feature, it is characterised in that: it includes Following steps:
(1), it formalizes the building of concept: obtaining open and drug poisonous substance related text corpus, parsing obtains corresponding relationship shape Formula background and formalization concept, the specific steps are as follows:
(1.1) relation form background: field concept background version is defined, i.e., using relationship as the relation form background of core, relationship Formal Context is defined as ternary relation group set a K, K=(G, M, RI), and wherein G is the set of entry object;M is mark word The set of item;The set of multivalue entity associated of its codomain of RI between G, M, (g, m, ri) ∈ K indicate that g in the case where being associated with ri, has m Value, relation form background are simply denoted as K;
(1.2) relation form concept: definition is defined as follows using relationship as the formalization concept of core:
At relation form background K=(G, M, RI), for setIn the presence of:
(a) f is mapped1: G0→ RI, is denoted as
MappingRI0→ G, is denoted as
(b) f is mapped2: M0→ RI, is denoted as
MappingRI0→ M, is denoted as
If meeting condition respectively between setThen binary group (G0, RI0) it is referred to as the main concept generated under relation form background K, main concept set is denoted as SC=(G0, RI0);Binary Group (M0, RI0) it is known as object concept, object concept set is denoted as OC=(M0, RI0), both concepts are referred to as relationship below Formal notion RC is denoted as RC=SC ∪ OC;
(1.3) be based on the definition of step (1.1), (1.2), from open collaboration data library, obtain entry relation form background with And relation form concept;
(1.4) (1.3) iteration expansion relation Formal Context and relation form concept are repeated, until relation form concept lattice scale reaches To pre-determined size, formalizing concept initialization terminates;
(2), multichannel term vector construct: with above-mentioned relation form concept be mark basis, to text entry multi-channel feature into Row training, trains thinking, using classical Skip-gram model thinking, the specific steps of different channel text feature processing It is as follows:
(2.1) text syntactic distribution feature: being based on parent drugs poisonous substance, entry versatility corpus, carries out simple participle pretreatment;
(2.2) processing of entry directory feature: the directory information of entry features the semanteme of entry, algorithm from certain angle Using entry as entry, catalogue entry is that context entry carries out the training of neural network to entry-directory information, is obtained Feature vector;
(2.3) processing of entry label characteristics: entry label is the varigrained classification information of entry semanteme.Algorithm is with word Item is entry, and label entry is context entry, to entry-label information, carries out the training of neural network, obtains feature Vector;
(2.4) processing of entry entry features: entry entry is entry attribute information.Algorithm is using entry as entry, entry Entry is context entry, to entry-entry information, carries out the training of neural network, obtains feature vector;
(3), the mapping of multi-channel feature vector and formal notion: concept regards the combination and reference of feature, i.e., different features as Combination forms different concepts, and different concepts have referred to the set of a certain feature, in order to portray field text concept, need to establish The mapping of feature vector and formal notion, the specific steps are as follows:
(3.1) using more relation form concepts as label, each entry only carries out maximum entropy calculating with a certain label, completes vector point Measure the determination of split values;
(3.2) if there is multiple concepts tabs, its corresponding maximum information gain is identical, then currently with the few concept of intension into Line splitting;
(3.3) it repeats the above process, until the small Mr. Yu's threshold value of number of tags of each subset;
(4), using random forest decision tree, decision is compared to drug poisonous substance adverse reaction, the step of process is as follows:
(4.1) drug poisonous substance had both deposited the processing of case text, obtained the entry vector and relation form concept set of each case;
(4.2) the plain text pretreatment of adverse reaction text description, obtains its feature vector as much as possible for corresponding to entry;
(4.3) based on random forest, the categorised decision of feature vector is carried out, calculates the phase of its vector with relation form concept Like degree;
(4.4) it is realized bad anti-based on multi-channel feature vector and relation form concept in entry and two semantic hierarchies of concept It should be compared with the intelligence of drug reading matter;
(4.5) recommend multiple candidate suspected drug poisonous substances and its emergence treatment scheme.
CN201810728746.2A 2018-07-05 2018-07-05 Merge the drug poisonous substance adverse reaction intelligent answer method of multichannel text feature Pending CN108984699A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810728746.2A CN108984699A (en) 2018-07-05 2018-07-05 Merge the drug poisonous substance adverse reaction intelligent answer method of multichannel text feature

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810728746.2A CN108984699A (en) 2018-07-05 2018-07-05 Merge the drug poisonous substance adverse reaction intelligent answer method of multichannel text feature

Publications (1)

Publication Number Publication Date
CN108984699A true CN108984699A (en) 2018-12-11

Family

ID=64537085

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810728746.2A Pending CN108984699A (en) 2018-07-05 2018-07-05 Merge the drug poisonous substance adverse reaction intelligent answer method of multichannel text feature

Country Status (1)

Country Link
CN (1) CN108984699A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109767817A (en) * 2019-01-16 2019-05-17 南通大学 A kind of drug potential adverse effect discovery method based on neural network language model
CN115577699A (en) * 2022-12-09 2023-01-06 杭州北冥星眸科技有限公司 Method for determining reasonability of text item, electronic equipment and storage medium
CN116504331A (en) * 2023-04-28 2023-07-28 东北林业大学 Frequency score prediction method for drug side effects based on multiple modes and multiple tasks

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109767817A (en) * 2019-01-16 2019-05-17 南通大学 A kind of drug potential adverse effect discovery method based on neural network language model
CN115577699A (en) * 2022-12-09 2023-01-06 杭州北冥星眸科技有限公司 Method for determining reasonability of text item, electronic equipment and storage medium
CN115577699B (en) * 2022-12-09 2023-04-14 杭州北冥星眸科技有限公司 Method for determining text entry reasonableness, electronic equipment and storage medium
CN116504331A (en) * 2023-04-28 2023-07-28 东北林业大学 Frequency score prediction method for drug side effects based on multiple modes and multiple tasks

Similar Documents

Publication Publication Date Title
Syed et al. Full-text or abstract? examining topic coherence scores using latent dirichlet allocation
Poux et al. Voxel-based 3D point cloud semantic segmentation: Unsupervised geometric and relationship featuring vs deep learning methods
CN111414393B (en) Semantic similar case retrieval method and equipment based on medical knowledge graph
CN106682411B (en) A method of disease label is converted by physical examination diagnostic data
Ali et al. Prediction of Diseases in Smart Health Care System using Machine Learning
Castellano et al. Leveraging knowledge graphs and deep learning for automatic art analysis
Zubi et al. Using some data mining techniques for early diagnosis of lung cancer
CN108984699A (en) Merge the drug poisonous substance adverse reaction intelligent answer method of multichannel text feature
Abgaz et al. A methodology for semantic enrichment of cultural heritage images using artificial intelligence technologies
Role et al. Beyond cluster labeling: Semantic interpretation of clusters’ contents using a graph representation
Rong et al. Deriving external forces via convolutional neural networks for biomedical image segmentation
Dessì et al. A recommender system of medical reports leveraging cognitive computing and frame semantics
CN114003734A (en) Breast cancer risk factor knowledge system model, knowledge map system and construction method
Kiyasseh et al. CROCS: clustering and retrieval of cardiac signals based on patient disease class, sex, and age
CN116304114B (en) Intelligent data processing method and system based on surgical nursing
Bijari et al. Assisted neuroscience knowledge extraction via machine learning applied to neural reconstruction metadata on NeuroMorpho. Org
Hassanpour et al. Clustering rule bases using ontology-based similarity measures
Krishna et al. Automated image annotation for semantic indexing and retrieval of medical images
CN111460173A (en) Method for constructing disease ontology model of thyroid cancer
Abu et al. Biodiversity image retrieval framework for monogeneans
Wang et al. Application of data mining technology in medical image processing
Aggoune et al. Big data integration: A semantic mediation architecture using summary
Rajasekar et al. Machine learning algorithm for information extraction from gynaecological domain in Tamil
Han On the power of big data: Mining structures from massive, unstructured text data.
Pham et al. MeKG: Building a medical knowledge graph by data mining from MEDLINE

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20181211

WD01 Invention patent application deemed withdrawn after publication