CN112331332A - Disease prediction method and system based on multi-granularity feature fusion - Google Patents

Disease prediction method and system based on multi-granularity feature fusion Download PDF

Info

Publication number
CN112331332A
CN112331332A CN202011095993.7A CN202011095993A CN112331332A CN 112331332 A CN112331332 A CN 112331332A CN 202011095993 A CN202011095993 A CN 202011095993A CN 112331332 A CN112331332 A CN 112331332A
Authority
CN
China
Prior art keywords
features
fusion
disease
disease prediction
concept
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011095993.7A
Other languages
Chinese (zh)
Inventor
赵青
李建强
徐春
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing University of Technology
Original Assignee
Beijing University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing University of Technology filed Critical Beijing University of Technology
Priority to CN202011095993.7A priority Critical patent/CN112331332A/en
Publication of CN112331332A publication Critical patent/CN112331332A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/20ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for computer-aided diagnosis, e.g. based on medical expert systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/70ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for mining of medical data, e.g. analysing previous cases of other patients

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Public Health (AREA)
  • Medical Informatics (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • Molecular Biology (AREA)
  • Computational Linguistics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Evolutionary Computation (AREA)
  • Biophysics (AREA)
  • Databases & Information Systems (AREA)
  • Artificial Intelligence (AREA)
  • Pathology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Epidemiology (AREA)
  • Primary Health Care (AREA)
  • Machine Translation (AREA)

Abstract

The embodiment of the invention provides a disease prediction method and a system based on multi-granularity feature fusion, which comprises the following steps: acquiring fusion characteristics based on a disease to be predicted; inputting the fusion characteristics into a disease prediction model obtained by training to obtain a classification result of the disease types; the disease prediction model is obtained by training fusion characteristics of various diseases based on a parallel self-adaptive convolutional neural network model. According to the embodiment of the invention, by adopting a multi-granularity feature fusion prediction method, not only are fine-granularity words and conceptual features adopted, but also the concept relationship and attribute-value features with larger granularity are adopted to fully understand semantic information in a medical text, so that the performance of model disease prediction is improved.

Description

Disease prediction method and system based on multi-granularity feature fusion
Technical Field
The invention relates to the technical field of computers, in particular to a disease prediction method and system based on multi-granularity feature fusion.
Background
The disease prediction is to automatically divide the diseases into different categories by utilizing the existing semantic analysis technology, can help doctors or patients to quickly know the current disease course state of the patients, and carries out scheduling and coordination of key medical resources according to the prediction of possible intervention means.
Heretofore, the construction methods of prediction models are mainly classified into two types: hypothesis-based driving methods and data-based driving methods. The former starts with assumptions made by clinical experts based on observations and clinical experience, and then finds facts from medical data, verifying the authenticity of the assumptions by deductive reasoning. The predictive model is derived from a set of validated assumptions. Generally, it is assumed that the driven approach does not take full advantage of the valuable information contained in the medical data. The data-driven approach trains machine learning models using fully labeled medical data sets to achieve disease prediction. Traditional Machine Learning models require domain experts to specify clinical features in a special way, while the success of the final Prediction model depends largely on the sophisticated supervision of manually designed feature selection, e.g., the Effective Heart Disease Prediction Using Hybrid Machine Learning Techniques published by Senthilkmar Mohan et al in 2019 proposed a linear Hybrid random forest model for cardiology Prediction. Deep learning, which can reduce the complexity of traditional machine learning feature selection, automatically learns deeper features from data, has become the main approach of predictive models today. The Disease Prediction method based on deep learning usually adopts words or concept vectors as main feature expressions of medical texts, for example, the method is published by Guangkai Li, Songmao Zhang et al in the training Embedding with Domain Knowledge for Oral Disease Diagnosis Prediction article in SmartCom 2018 to learn concepts related to symptoms and diagnoses from Domain ontology and to learn concept features in electronic medical records by using neural network to construct an Oral Disease Prediction model. However, only considering the word or concept vector, because the feature granularity of the word or concept vector is too small, the word or concept vector is likely to cause insufficient extraction of semantic information contained in the medical text, and thus, a correct medical decision cannot be provided.
Disclosure of Invention
The embodiment of the invention provides a disease prediction method and system based on multi-granularity feature fusion, which are used for overcoming the defects in the prior art.
In a first aspect, an embodiment of the present invention provides a disease prediction method based on multi-granularity feature fusion, including:
acquiring fusion characteristics based on a disease to be predicted;
inputting the fusion characteristics into a disease prediction model obtained by training to obtain a classification result of the disease types; the disease prediction model is obtained by training fusion characteristics of various diseases based on a parallel self-adaptive convolutional neural network model.
Further, the disease prediction model is obtained by the following steps:
acquiring a text to be processed, and preprocessing the text to be processed to obtain a preprocessed text;
extracting the features of the preprocessed text to obtain extracted features;
fusing the extracted features based on multi-granularity features to obtain fused features of the various diseases;
and acquiring a parallel self-adaptive convolutional neural network model, inputting the fusion characteristics of the various diseases into the parallel self-adaptive convolutional neural network model for training to obtain the disease prediction model.
Further, the obtaining of the text to be processed and the preprocessing of the text to be processed to obtain the preprocessed text specifically include:
manually marking the medical text data according to the target category to be predicted, and loading the medical text data into a domain body to obtain the text to be processed;
and segmenting the text to be processed into Chinese character strings according to punctuation marks, numbers and space marks, and removing stop words to obtain the preprocessed text.
Further, the extracting features of the preprocessed text to obtain extracted features specifically includes:
and extracting the features of the preprocessed text through conceptual feature extraction, word feature extraction, conceptual relation feature extraction and attribute and value feature extraction to obtain the extracted features.
Further, the extracting features of the preprocessed text by extracting concept features, extracting word features, extracting concept relationship features, and extracting attribute and value features specifically includes:
mapping the preprocessed text to a field body to obtain text data, segmenting the text data into semantic sets by a maximum matching method, converting concept self characteristic types and concept type characteristics which can be matched from the field body into a vector form by adopting a word2vec model, and extracting the concept characteristics by combining the concept self characteristic types and the concept type characteristics;
converting self characteristic types and concept type characteristics which contain concepts which cannot be matched from the domain ontology into a vector form by adopting the word2vec model, and extracting word characteristics;
extracting relation trigger words among concepts by combining the word features, the position features and the negative word features, and representing the concept features and the relation trigger words as concept relation features by combining the concept features;
and further representing the conceptual features as disease and time results containing numerical types and detection and inspection results containing the numerical types and the category types to obtain attribute and value features.
Further, the fusing the extracted features based on multi-granularity features to obtain fused features of the multiple diseases specifically includes:
and directly carrying out vector splicing on the extracted features aiming at the category with large difference of the predicted target, or fusing the extracted features by adopting a weight-based feature fusion method aiming at the category with high similarity of the predicted target to obtain the fusion features of the diseases.
Further, the obtaining a parallel adaptive convolutional neural network model, inputting the fusion characteristics of the multiple diseases into the parallel adaptive convolutional neural network model for training, and obtaining the disease prediction model specifically includes:
segmenting a sentence into different parts according to the difference between the concept relationship characteristic and the attribute and value characteristic to extract semantic information contained in the sentence;
and fusing the semantic information with the concept features and the word features to train the parallel self-adaptive convolutional neural network model, and maintaining the validity of the sentence by adopting dropout operation and zero padding on a convolutional layer to obtain the disease prediction model.
In a second aspect, an embodiment of the present invention further provides a disease prediction system based on multi-granularity feature fusion, including:
the acquisition module is used for acquiring fusion characteristics based on the disease to be predicted;
the processing module is used for inputting the fusion characteristics to a disease prediction model obtained by training to obtain a classification result of the disease types; the disease prediction model is obtained by training fusion characteristics of various diseases based on a parallel self-adaptive convolutional neural network model.
In a third aspect, an embodiment of the present invention further provides an electronic device, including a memory, a processor, and a computer program stored in the memory and executable on the processor, where the processor executes the program to implement the steps of the method for predicting a disease based on multi-granular feature fusion as described in any one of the above.
In a fourth aspect, the present invention further provides a non-transitory computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the steps of the multi-granular feature fusion based disease prediction method as described in any one of the above.
According to the disease prediction method and system based on multi-granularity feature fusion, provided by the embodiment of the invention, by adopting the multi-granularity feature fusion prediction method, not only are words and concept features of fine granularity adopted, but also semantic information in a medical text is fully understood by adopting concept relations and attribute-value features of larger granularity, so that the performance of model disease prediction is improved.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and those skilled in the art can also obtain other drawings according to the drawings without creative efforts.
FIG. 1 is a schematic flow chart of a disease prediction method based on multi-granularity feature fusion according to an embodiment of the present invention;
fig. 2 is an exploded view of a flow module according to an embodiment of the present invention:
FIG. 3 is a schematic structural diagram of a disease prediction system based on multi-granularity feature fusion according to an embodiment of the present invention;
fig. 4 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Aiming at the problems in the prior art, the embodiment of the invention provides a disease prediction method based on multi-granularity feature fusion, the method extracts features with different granularities based on the existing medical ontology and labeled corpus and fuses the features to train a disease prediction model, and the trained model can provide a category corresponding to a prediction target and can be used for disease prediction related applications, such as disease type prediction or disease severity prediction.
Fig. 1 is a schematic flow chart of a disease prediction method based on multi-granularity feature fusion according to an embodiment of the present invention, as shown in fig. 1, including:
s1, acquiring fusion characteristics based on the disease to be predicted;
s2, inputting the fusion characteristics into a disease prediction model obtained by training to obtain a classification result of disease types; the disease prediction model is obtained by training fusion characteristics of various diseases based on a parallel self-adaptive convolutional neural network model.
Specifically, fusion features related to a disease to be predicted are obtained through a certain technical means, the fusion features are input into a pre-trained disease prediction model, and a final classification result of the disease type is obtained, wherein the disease prediction model is obtained on the basis of a parallel self-adaptive convolutional neural network and is trained through the fusion features of various diseases.
According to the embodiment of the invention, by adopting a multi-granularity feature fusion prediction method, not only are fine-granularity words and conceptual features adopted, but also the concept relationship and attribute-value features with larger granularity are adopted to fully understand semantic information in a medical text, so that the performance of model disease prediction is improved.
Based on the above embodiment, the disease prediction model is obtained by the following steps:
acquiring a text to be processed, and preprocessing the text to be processed to obtain a preprocessed text;
extracting the features of the preprocessed text to obtain extracted features;
fusing the extracted features based on multi-granularity features to obtain fused features of the various diseases;
and acquiring a parallel self-adaptive convolutional neural network model, inputting the fusion characteristics of the various diseases into the parallel self-adaptive convolutional neural network model for training to obtain the disease prediction model.
Specifically, as shown in fig. 2, when training a disease prediction model, firstly, data preprocessing 1 is performed on a domain ontology to obtain a preprocessed text, then the preprocessed text is subjected to feature extraction 2 including conceptual features 21, word features 22, conceptual relationship features 23 and attribute-value features 24 to obtain extracted features, then the extracted features are fused 3 based on multi-granularity features, wherein vector splicing 31 or a fusion method 32 based on feature weights is directly performed to obtain fusion features of multiple diseases, the model is trained by using the fusion features of multiple diseases based on the obtained parallel adaptive convolutional neural network model to obtain a trained disease prediction model 4, and finally, the trained model is used for disease type classification 5.
Based on any of the above embodiments, the obtaining of the text to be processed and the preprocessing of the text to be processed to obtain the preprocessed text specifically include:
manually marking the medical text data according to the target category to be predicted, and loading the medical text data into a domain body to obtain the text to be processed;
and segmenting the text to be processed into Chinese character strings according to punctuation marks, numbers and space marks, and removing stop words to obtain the preprocessed text.
Specifically, the medical text data is manually marked according to the target category to be predicted, and then a domain ontology is loaded; and segmenting the text to be processed into Chinese character strings according to punctuations, numbers and space marks, and removing stop words to obtain the preprocessed text.
Based on any of the above embodiments, the extracting features of the preprocessed text to obtain extracted features specifically includes:
and extracting the features of the preprocessed text through conceptual feature extraction, word feature extraction, conceptual relation feature extraction and attribute and value feature extraction to obtain the extracted features.
The extracting features of the preprocessed text are extracted through conceptual feature extraction, word feature extraction, conceptual relation feature extraction and attribute and value feature extraction, and the extracting features are obtained specifically by:
mapping the preprocessed text to a field body to obtain text data, segmenting the text data into semantic sets by a maximum matching method, converting concept self characteristic types and concept type characteristics which can be matched from the field body into a vector form by adopting a word2vec model, and extracting the concept characteristics by combining the concept self characteristic types and the concept type characteristics;
converting self characteristic types and concept type characteristics which contain concepts which cannot be matched from the domain ontology into a vector form by adopting the word2vec model, and extracting word characteristics;
extracting relation trigger words among concepts by combining the word features, the position features and the negative word features, and representing the concept features and the relation trigger words as concept relation features by combining the concept features;
and further representing the conceptual features as disease and time results containing numerical types and detection and inspection results containing the numerical types and the category types to obtain attribute and value features.
Specifically, the method comprises the following four steps: extracting concept features, extracting word features, extracting concept relation features and extracting attribute-value features.
The concept features include a concept self feature and a concept type feature. Firstly, mapping the preprocessed text to a domain ontology, and segmenting text data into semantic sets { Y ] by a maximum matching method1,…YnE.D, D is text data which contains a concept set C matched with the domain ontology1,…CnBelongs to Y and has a corresponding concept type C1type,…CNtypeAnd secondly, converting concepts and concept types into a d-dimensional vector form by adopting a word2vec model. And finally, extracting the concept features by combining the features of the concept itself with the features of the concept type, and recording the concept features as
Figure BDA0002723759910000081
e={e1…en},eiE, where ciBelonging to a concept set for a concept self-feature C1,…CN},citypeIs a concept ciIs of the type C1type,…CNtype},
Figure BDA0002723759910000084
Is a vector stitching operation.
Word features refer to semantics for which no matching concept can be found from the domain ontology and are written as { W }1,…WnBelongs to Y, similarly adopting word2vec to convert the word into d-dimensional vector form, and recording w as { w ═ w-1,…wn}。
Extracting relation trigger words between concepts by combining the word characteristics, the position characteristics and the negative word characteristics, and expressing the concept relation characteristics into a triple form by combining the concept characteristics and marking as pi=(ei,ri,eo),p={p1…pn},piE is p, wherein eiAnd eoRepresenting a conceptual feature, riRepresenting relationship triggers between concepts. Has a { s1…si…snE.g. D, where SiComposing s by m semanticsi={w1…pi…qo…wmIn which eiAnd eoDenotes SiThe concept contained in (a) is,
Figure BDA0002723759910000082
{w1…wmis the sentence SiEach word relative to a conceptual feature eiAnd eoThere are two relative distances between them, which are recorded as
Figure BDA0002723759910000083
Since the negative word can change the meaning of the word, the negative word feature is extracted by loading the negative word point and is marked as { n1…nmE.w, w represents a set of word features. The relationship trigger between the last concepts may be represented by an expression as
Figure BDA0002723759910000091
Wherein the conceptual feature eiAnd eoRelation trigger word riIn the same spatial dimension, denoted as
Figure BDA0002723759910000092
Attribute-value features contain two classes: disease-time and test-exam results. The attribute refers to a conceptual feature, the value in the disease-time includes only a numerical type, and the value in the detection-examination result includes a numerical type and a category type. For numerical types, both the value and its corresponding unit symbol, e.g. the value ViWith its corresponding unit symbol UiThe updated value type is calculated as
Figure BDA0002723759910000093
Wherein u isiRepresenting a vector form of a unit symbol. Disease-time characteristic is denoted ti=(eo,vm) For the value type in the detection-inspection result, it is necessary to extract an index level feature, such as concept C ═ C1,C2,…,CnIs given a value v ═ v } v1,v2,…,vnAnd index level L ═ L1,L2,L3The value of the examination result can be represented in the form of a triplet zi=(ei,vi,li) Wherein e isoAnd eiFor conceptual features, { vm,vi}∈v,liIs an index level vector. The category type has no unit symbol and is usually composed of character strings such as: negative, positive, etc. Therefore, the semantics contained in the expression text with accurate negative word features need to be extracted, the category vector of the expression text is directly extracted for the category type without the negative words, and the category type with the negative words, combined with the category features and the negative word features, can be expressed as
Figure BDA0002723759910000094
Wherein b ismAs a class feature, nmTo negate the word vector, the class type of the check-check result can therefore be represented as ki=(em,gm) Wherein e ismRepresenting a conceptual feature. t is ti,zi,ki∈qi,q={q1,…,qn},qiE q, where q represents a set of attribute value features.
Based on any of the above embodiments, the fusing the extracted features based on multi-granularity features to obtain the fused features of the multiple diseases specifically includes:
and directly carrying out vector splicing on the extracted features aiming at the category with large difference of the predicted target, or fusing the extracted features by adopting a weight-based feature fusion method aiming at the category with high similarity of the predicted target to obtain the fusion features of the diseases.
Specifically, different feature fusion methods are adopted according to different predicted targets, and the extracted features can be directly subjected to vector splicing aiming at the category with larger difference of the predicted targets; the method for fusing the features based on the weight is adopted for the category with higher similarity of the predicted target, and is specifically described as follows:
vector splicing is directly carried out on the extracted features, and the formula can be expressed as follows:
Figure BDA0002723759910000101
wherein e isiRepresenting a conceptual feature, wiRepresenting a word feature, piRepresenting a conceptual relational feature, qiRepresenting attribute-value features.
In the weight-based feature fusion method, the formula can be expressed as:
first, different weights are set for each class of features according to their importance in such features. For example, 4 weights are set, and the calculation formula can be expressed as:
Figure BDA0002723759910000102
Figure BDA0002723759910000103
Figure BDA0002723759910000104
Figure BDA0002723759910000105
wherein e isiRepresenting a conceptual feature, wiRepresenting a word feature, piRepresenting a conceptual relational feature, qiRepresenting attribute-value features. Alpha is alphai∈[0,1]And is and
Figure BDA0002723759910000106
next, a weight-based feature value is calculated by combining the weight obtained in the above formula and the feature vector.
Figure BDA00027237599100001010
Figure BDA0002723759910000107
Figure BDA0002723759910000108
Figure BDA0002723759910000109
Wherein, CEiRepresenting weight-based conceptual features, WEiRepresenting weight-based word features, REiRepresenting weight-based conceptual relational features, VEiRepresenting weight-based attribute-value features.
And fusing the concept features, the word features, the concept relation features and the attribute-value features based on the weights as the input of the neural network of the parallel adaptive rolling machine to train the disease prediction model according to the contents.
Figure BDA0002723759910000111
Based on any of the above embodiments, the obtaining a parallel adaptive convolutional neural network model, inputting the fusion characteristics of the multiple diseases into the parallel adaptive convolutional neural network model for training, and obtaining the disease prediction model specifically includes:
segmenting a sentence into different parts according to the difference between the concept relationship characteristic and the attribute and value characteristic to extract semantic information contained in the sentence;
and fusing the semantic information with the concept features and the word features to train the parallel self-adaptive convolutional neural network model, and maintaining the validity of the sentence by adopting dropout operation and zero padding on a convolutional layer to obtain the disease prediction model.
Specifically, a parallel adaptive rolling machine neural network is adopted to train a disease prediction model, and the specific formula is as follows:
and (3) rolling layers: having a sentence si={w1,w2,…,wmIn which wjIs the sentence siThe jth word vector of (a) th,
Figure BDA0002723759910000112
h is the length of the convolution kernel, indicating that h words are contained. The convolution operation for the jth word is:
cj=f(k·wi:i+h-1+b)
wherein
Figure BDA0002723759910000113
Is a matrix of convolution kernels, b is a deviation, wi:i+h-1Representation incorporates word vectors from the ith to i + h-1, and f (-) represents a non-linear activation function, usually with ReLU, cjRepresenting a feature graph, sentence s, after a convolution operationiThe characteristic diagram of (A) is shown as:
Figure BDA0002723759910000114
suppose there are l convolution kernels of length h, 1<i<l, the characteristic diagram is shown as:
Figure BDA0002723759910000115
parallel adaptive pooling layer: firstly, the sentence is divided into different parts according to the difference of concept relationship and attribute-value characteristics, and two characteristics are learned in parallel.
A concept relationship characteristic, c is the position of the sentence according to the concept pairjIs divided into three parts [ cj1,cj2,cj3]Secondly, the most important information in the sentence is obtained by calculating the maximum value of each part, and the calculation formula is as follows:
Figure BDA0002723759910000116
finally, all the feature maps after the convolution operation are spliced to obtain a sentence siCharacteristic vector b ofsp=ReLU(v)。
Attribute value feature, c, sentence according to concept positionjDivided into two parts [ cj1,cj2]Secondly, the information of the value most related to the concept relationship in the sentence is obtained by calculating the maximum value of each part, and the calculation formula is as follows:
Figure BDA0002723759910000121
all the characteristic graphs after the volume operation are spliced to obtain a sentence siCharacteristic vector b ofsqFinally, combining the sentence characteristic vector of the concept relationship and the attribute value to obtain the final characteristic vector of the final sentence
Figure BDA0002723759910000122
And finally, combining the extracted concept relation, attribute-characteristics, concept and word characteristics, putting the result into a classification layer of a parallel self-adaptive rolling machine neural network, and generating the final classification result of the disease type through a softmax classifier. Based on different feature fusion methods, the result formula generated by the classifier is as follows:
(1) vector splicing is directly carried out on the extracted features:
Figure BDA0002723759910000123
O=softmax(Wohi+bs)
rs=argmax(O)
wherein e isiAs a conceptual feature, wiIs a word feature, piFor conceptual relational features, qiAs attribute-value features, bsAs a sentence siCharacteristic vector of (2), WoAs weights, O e [1, n ∈ ]]Indicates that there are n relationship types, rsIs the last relationship category label.
(2) The weight-based feature fusion method comprises the following steps:
Figure BDA0002723759910000124
D=softmax(Wofi+bs)
rs=argmax(D)
wherein, CEiRepresenting weight-based conceptual features, WEiRepresenting weight-based word features, REiRepresenting weight-based conceptual relational features, VEiRepresenting weight-based attribute-value features. bsAs a sentence siCharacteristic vector of (2), WoFor weight, D is equal to [1, n ]]Indicates that there are n relationship types, rsIs the last relationship category label.
The disease prediction system based on multi-granularity feature fusion provided by the embodiment of the invention is described below, and the disease prediction system based on multi-granularity feature fusion described below and the disease prediction method based on multi-granularity feature fusion described above can be referred to correspondingly.
Fig. 3 is a schematic structural diagram of a disease prediction system based on multi-granularity feature fusion according to an embodiment of the present invention, as shown in fig. 3, including: an acquisition module 31 and a processing module 32; wherein:
the obtaining module 31 is used for obtaining fusion characteristics based on the disease to be predicted; the processing module 32 is configured to input the fusion features into a disease prediction model obtained through training, so as to obtain a classification result of a disease type; the disease prediction model is obtained by training fusion characteristics of various diseases based on a parallel self-adaptive convolutional neural network model.
According to the embodiment of the invention, by adopting a multi-granularity feature fusion prediction method, not only are fine-granularity words and conceptual features adopted, but also the concept relationship and attribute-value features with larger granularity are adopted to fully understand semantic information in a medical text, so that the performance of model disease prediction is improved.
Fig. 4 illustrates a physical structure diagram of an electronic device, which may include, as shown in fig. 4: a processor (processor)410, a communication Interface 420, a memory (memory)430 and a communication bus 440, wherein the processor 410, the communication Interface 420 and the memory 430 are communicated with each other via the communication bus 440. The processor 410 may invoke logic instructions in the memory 430 to perform a method of disease prediction based on multi-granular feature fusion, the method comprising: acquiring fusion characteristics based on a disease to be predicted; inputting the fusion characteristics into a disease prediction model obtained by training to obtain a classification result of the disease types; the disease prediction model is obtained by training fusion characteristics of various diseases based on a parallel self-adaptive convolutional neural network model.
In addition, the logic instructions in the memory 430 may be implemented in the form of software functional units and stored in a computer readable storage medium when the software functional units are sold or used as independent products. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
In another aspect, an embodiment of the present invention further provides a computer program product, where the computer program product includes a computer program stored on a non-transitory computer-readable storage medium, the computer program includes program instructions, and when the program instructions are executed by a computer, the computer is capable of executing the method for predicting diseases based on multi-granularity feature fusion provided by the above-mentioned method embodiments, where the method includes: acquiring fusion characteristics based on a disease to be predicted; inputting the fusion characteristics into a disease prediction model obtained by training to obtain a classification result of the disease types; the disease prediction model is obtained by training fusion characteristics of various diseases based on a parallel self-adaptive convolutional neural network model.
In yet another aspect, an embodiment of the present invention further provides a non-transitory computer-readable storage medium, on which a computer program is stored, where the computer program is implemented by a processor to perform the method for predicting a disease based on multi-granular feature fusion provided in the foregoing embodiments, and the method includes: acquiring fusion characteristics based on a disease to be predicted; inputting the fusion characteristics into a disease prediction model obtained by training to obtain a classification result of the disease types; the disease prediction model is obtained by training fusion characteristics of various diseases based on a parallel self-adaptive convolutional neural network model.
The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.
Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware. With this understanding in mind, the above-described technical solutions may be embodied in the form of a software product, which can be stored in a computer-readable storage medium such as ROM/RAM, magnetic disk, optical disk, etc., and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the methods described in the embodiments or some parts of the embodiments.
Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims (10)

1. A disease prediction method based on multi-granularity feature fusion is characterized by comprising the following steps:
acquiring fusion characteristics based on a disease to be predicted;
inputting the fusion characteristics into a disease prediction model obtained by training to obtain a classification result of the disease types; the disease prediction model is obtained by training fusion characteristics of various diseases based on a parallel self-adaptive convolutional neural network model.
2. The method for predicting diseases based on multi-granularity feature fusion according to claim 1, wherein the disease prediction model is obtained by the following steps:
acquiring a text to be processed, and preprocessing the text to be processed to obtain a preprocessed text;
extracting the features of the preprocessed text to obtain extracted features;
fusing the extracted features based on multi-granularity features to obtain fused features of the various diseases;
and acquiring a parallel self-adaptive convolutional neural network model, inputting the fusion characteristics of the various diseases into the parallel self-adaptive convolutional neural network model for training to obtain the disease prediction model.
3. The disease prediction method based on multi-granularity feature fusion as claimed in claim 2, wherein the obtaining of the text to be processed and the preprocessing of the text to be processed to obtain the preprocessed text specifically comprises:
manually marking the medical text data according to the target category to be predicted, and loading the medical text data into a domain body to obtain the text to be processed;
and segmenting the text to be processed into Chinese character strings according to punctuation marks, numbers and space marks, and removing stop words to obtain the preprocessed text.
4. The multi-granularity feature fusion-based disease prediction method according to claim 2, wherein the extracting features from the preprocessed text to obtain extracted features specifically comprises:
and extracting the features of the preprocessed text through conceptual feature extraction, word feature extraction, conceptual relation feature extraction and attribute and value feature extraction to obtain the extracted features.
5. The multi-granularity feature fusion-based disease prediction method according to claim 4, wherein the extracting features are obtained by performing feature extraction on the preprocessed text through conceptual feature extraction, word feature extraction, conceptual relationship feature extraction, and attribute and value feature extraction, and specifically comprises:
mapping the preprocessed text to a field body to obtain text data, segmenting the text data into semantic sets by a maximum matching method, converting concept self characteristic types and concept type characteristics which can be matched from the field body into a vector form by adopting a word2vec model, and extracting the concept characteristics by combining the concept self characteristic types and the concept type characteristics;
converting self characteristic types and concept type characteristics which contain concepts which cannot be matched from the domain ontology into a vector form by adopting the word2vec model, and extracting word characteristics;
extracting relation trigger words among concepts by combining the word features, the position features and the negative word features, and representing the concept features and the relation trigger words as concept relation features by combining the concept features;
and further representing the conceptual features as disease and time results containing numerical types and detection and inspection results containing the numerical types and the category types to obtain attribute and value features.
6. The multi-granularity feature fusion-based disease prediction method according to claim 2, wherein the fusion of the extracted features based on the multi-granularity features to obtain the fusion features of the plurality of diseases specifically comprises:
and directly carrying out vector splicing on the extracted features aiming at the category with large difference of the predicted target, or fusing the extracted features by adopting a weight-based feature fusion method aiming at the category with high similarity of the predicted target to obtain the fusion features of the diseases.
7. The method according to claim 5, wherein the obtaining of the parallel adaptive convolutional neural network model and the inputting of the fusion features of the multiple diseases into the parallel adaptive convolutional neural network model for training to obtain the disease prediction model specifically comprises:
segmenting a sentence into different parts according to the difference between the concept relationship characteristic and the attribute and value characteristic to extract semantic information contained in the sentence;
and fusing the semantic information with the concept features and the word features to train the parallel self-adaptive convolutional neural network model, and maintaining the validity of the sentence by adopting dropout operation and zero padding on a convolutional layer to obtain the disease prediction model.
8. A disease prediction system based on multi-granular feature fusion, comprising:
the acquisition module is used for acquiring fusion characteristics based on the disease to be predicted;
the processing module is used for inputting the fusion characteristics to a disease prediction model obtained by training to obtain a classification result of the disease types; the disease prediction model is obtained by training fusion characteristics of various diseases based on a parallel self-adaptive convolutional neural network model.
9. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor when executing the program implements the steps of the multi-granular feature fusion based disease prediction method according to any one of claims 1 to 7.
10. A non-transitory computer readable storage medium, on which a computer program is stored, which, when being executed by a processor, performs the steps of the multi-granular feature fusion based disease prediction method according to any one of claims 1 to 7.
CN202011095993.7A 2020-10-14 2020-10-14 Disease prediction method and system based on multi-granularity feature fusion Pending CN112331332A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011095993.7A CN112331332A (en) 2020-10-14 2020-10-14 Disease prediction method and system based on multi-granularity feature fusion

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011095993.7A CN112331332A (en) 2020-10-14 2020-10-14 Disease prediction method and system based on multi-granularity feature fusion

Publications (1)

Publication Number Publication Date
CN112331332A true CN112331332A (en) 2021-02-05

Family

ID=74314917

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011095993.7A Pending CN112331332A (en) 2020-10-14 2020-10-14 Disease prediction method and system based on multi-granularity feature fusion

Country Status (1)

Country Link
CN (1) CN112331332A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113035362A (en) * 2021-02-26 2021-06-25 北京工业大学 Medical prediction method and system based on semantic graph network
CN115579128A (en) * 2022-10-19 2023-01-06 内蒙古卫数数据科技有限公司 Multi-model feature-enhanced disease screening system

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109192299A (en) * 2018-08-13 2019-01-11 中国科学院计算技术研究所 A kind of medical analysis auxiliary system based on convolutional neural networks
CN109800437A (en) * 2019-01-31 2019-05-24 北京工业大学 A kind of name entity recognition method based on Fusion Features
CN110297908A (en) * 2019-07-01 2019-10-01 中国医学科学院医学信息研究所 Diagnosis and treatment program prediction method and device
CN111079377A (en) * 2019-12-03 2020-04-28 哈尔滨工程大学 Method for recognizing named entities oriented to Chinese medical texts
CN111223553A (en) * 2020-01-03 2020-06-02 大连理工大学 Two-stage deep migration learning traditional Chinese medicine tongue diagnosis model
CN111274397A (en) * 2020-01-20 2020-06-12 北京百度网讯科技有限公司 Method and device for establishing entity relationship detection model

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109192299A (en) * 2018-08-13 2019-01-11 中国科学院计算技术研究所 A kind of medical analysis auxiliary system based on convolutional neural networks
CN109800437A (en) * 2019-01-31 2019-05-24 北京工业大学 A kind of name entity recognition method based on Fusion Features
CN110297908A (en) * 2019-07-01 2019-10-01 中国医学科学院医学信息研究所 Diagnosis and treatment program prediction method and device
CN111079377A (en) * 2019-12-03 2020-04-28 哈尔滨工程大学 Method for recognizing named entities oriented to Chinese medical texts
CN111223553A (en) * 2020-01-03 2020-06-02 大连理工大学 Two-stage deep migration learning traditional Chinese medicine tongue diagnosis model
CN111274397A (en) * 2020-01-20 2020-06-12 北京百度网讯科技有限公司 Method and device for establishing entity relationship detection model

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
冯钦林;杨志豪;林鸿飞;: "疾病-病症和病症-治疗物质的关系抽取研究", 计算机工程与应用, no. 10, 9 June 2017 (2017-06-09), pages 251 - 257 *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113035362A (en) * 2021-02-26 2021-06-25 北京工业大学 Medical prediction method and system based on semantic graph network
CN113035362B (en) * 2021-02-26 2024-04-09 北京工业大学 Medical prediction method and system based on semantic graph network
CN115579128A (en) * 2022-10-19 2023-01-06 内蒙古卫数数据科技有限公司 Multi-model feature-enhanced disease screening system
CN115579128B (en) * 2022-10-19 2023-11-21 内蒙古卫数数据科技有限公司 Multi-model characteristic enhanced disease screening system

Similar Documents

Publication Publication Date Title
CN110427461B (en) Intelligent question and answer information processing method, electronic equipment and computer readable storage medium
Carchiolo et al. Medical prescription classification: a NLP-based approach
Moradi et al. A cross-modality neural network transform for semi-automatic medical image annotation
CN111832307A (en) Entity relationship extraction method and system based on knowledge enhancement
CN113035362A (en) Medical prediction method and system based on semantic graph network
CN111814454A (en) Multi-modal network spoofing detection model on social network
US20230315994A1 (en) Natural Language Processing for Addressing Bias
CN112800225B (en) Microblog comment emotion classification method and system
CN113849653B (en) Text classification method and device
CN113392209A (en) Text clustering method based on artificial intelligence, related equipment and storage medium
CN112331332A (en) Disease prediction method and system based on multi-granularity feature fusion
CN113704392A (en) Method, device and equipment for extracting entity relationship in text and storage medium
US11948387B2 (en) Optimized policy-based active learning for content detection
Tüselmann et al. Recognition-free question answering on handwritten document collections
Chandra et al. Cyberbullying detection using recursive neural network through offline repository
Gasimova Automated enriched medical concept generation for chest X-ray images
CN115269833A (en) Event information extraction method and system based on deep semantics and multitask learning
CN112765353B (en) Scientific research text-based biomedical subject classification method and device
CN112069322B (en) Text multi-label analysis method and device, electronic equipment and storage medium
CN114582449A (en) Electronic medical record named entity standardization method and system based on XLNet-BiGRU-CRF model
Huang et al. Learning emotion recognition and response generation for a service robot
CN113836892A (en) Sample size data extraction method and device, electronic equipment and storage medium
CN113886539A (en) Method and device for recommending dialect, customer service equipment and storage medium
Zhang et al. Human-like explanation for text classification with limited attention supervision
Velammal Development of knowledge based sentiment analysis system using lexicon approach on twitter data

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination