CN107516110A - A kind of medical question and answer Semantic Clustering method based on integrated convolutional encoding - Google Patents

A kind of medical question and answer Semantic Clustering method based on integrated convolutional encoding Download PDF

Info

Publication number
CN107516110A
CN107516110A CN201710723583.4A CN201710723583A CN107516110A CN 107516110 A CN107516110 A CN 107516110A CN 201710723583 A CN201710723583 A CN 201710723583A CN 107516110 A CN107516110 A CN 107516110A
Authority
CN
China
Prior art keywords
mrow
msub
msubsup
mtr
mtd
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201710723583.4A
Other languages
Chinese (zh)
Other versions
CN107516110B (en
Inventor
余志文
戴丹
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
South China University of Technology SCUT
Original Assignee
South China University of Technology SCUT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by South China University of Technology SCUT filed Critical South China University of Technology SCUT
Priority to CN201710723583.4A priority Critical patent/CN107516110B/en
Publication of CN107516110A publication Critical patent/CN107516110A/en
Application granted granted Critical
Publication of CN107516110B publication Critical patent/CN107516110B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • G06F18/23211Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with adaptive number of clusters
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/217Validation; Performance evaluation; Active pattern learning techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Abstract

The invention discloses a kind of medical question and answer Semantic Clustering method based on integrated convolutional encoding, it is related to machine learning field, the described method comprises the following steps:Medical advice platform user question and answer language material gathers, the selection of convolution kernel, merges the character representation of different convolution kernels, and obtaining final data using self-editing ink recorder characterizes, and carries out medical advice question and answer Semantic Clustering.Compared with traditional deep learning method:This method extracts different features with different convolution kernels, the feature of extraction is more fully and diversified, and use different feature combination methods, the feature extracted is subjected to fusion expression, therefore generalization ability of the present invention is strong, and Semantic Clustering accuracy rate is high, can preferably help user to understand own situation based on this method, and doctor can be aided in carry out disease detection, there is very big application value to the automatically request-answering system for building medical treatment.

Description

A kind of medical question and answer Semantic Clustering method based on integrated convolutional encoding
Technical field
The present invention relates to Artificial smart field, especially machine learning field, and in particular to one kind is based on integrated The medical question and answer Semantic Clustering method of convolutional encoding.
Background technology
With the rapid development of Internet, the life style of people is gradually changed.Through survey, used when common When family body occurs uncomfortable, the people for having 90% can be to the information that correlation is searched on internet.Therefore internet is also changing Become medical treatment ecology.In internet medical treatment, online disease hospital guide is a critically important and crucial step.So that healthy related Medical field, there are many online disease question and answer websites.Patient by describe oneself experience, the detailed state of an illness, medication and Treatment etc. is exchanged and obtained the related nursing knowledge of disease with doctor.In these relevant disease question and answer, comprising The disease informations of many individual examples.If obtaining patient from these medical question and answer language materials to the sign of relevant disease, help to send out Dig and understand substantial amounts of useful information, it is possible to provide retrospective, disease forecasting can be carried out to patient, and in medical automatically request-answering system In be easier to understand patient consulting the problem of, to structure intelligent medical have great significance.
Medical language material text data has the characteristic such as noise, sparse, higher-dimension, isomery, incomplete, system sex bias, and together The symptom difference patient of sample has different describing modes, and conventional method is that suitable feature mode is chosen by expert, is used in combination Special mode is indicated, but insufficient supervision definition of this method to feature space yardstick, unsuitable extensive, is also missed The good opportunity of the new pattern of discovery and feature, this make it that conventional method is difficult to be characterized and modeled.Unsupervised representative learning, lead to The pattern and dependence crossed in automatic identification data, the limitation for overcoming supervised feature space to define of trying, association are a kind of Succinct and general sign, and the automatic acquisition of knowledge enables and more simply extracted when establishing grader or other fallout predictors Use information.And in recent years, deep learning is more and more in the application of image recognition, machine translation, intelligent answer etc., its Matter is to learn more accurately feature by building the machine learning model of multiple hidden layers and the training data of magnanimity, can more be carved The internal information of data is drawn, fails to understand that changeable, cross-cutting big data has significant advantage for analyzing unstructured, pattern, it is right These information knowledges have good expression.But in medical field, the correlation technique application of deep learning be not very extensively, It is not applied in the workflow of medical intelligent answer etc. reliably very much.
At present, national governments both domestic and external, major medical institutions and research institution have all put into pole in terms of intelligent medical treatment Big human and material resources and financial resources are studied.Abroad, it is relatively early to carry out having the U.S., adding and take for regional collaboration healthcare correlative study Greatly, the country such as Britain and Australia.Wherein, it is that Google is carried out to the electronic health record of patient based on drop to have the work represented Make an uproar self-editing ink recorder expression study come construction feature space, so as to patient carry out disease forecasting and provide correlation health refer to Lead.And the country has fourth little Hong [health resources shared mechanism and question and answer recommendation technique study Xian Electronics Science and Technology University, 2011] to exist On the basis of studying medical and health metadata standard, the step of realizing resource-sharing, including resource standardization and data set are provided Into.One has been inquired into using metadata as core, " being physically distributed, unify in logic ", the medical resource based on SOA architectures Shared platform, it is proposed that hierarchy type Sharing Model, and go out to send from five different views and analyze this shared platform.Liu Fang Deng [intelligent Answer System towards medical industry is studied and realizes microelectronics and computer, and 2012,11:95-98] for reality The intelligent Answer System of disease and diagnosis and treatment in existing medical information, proposes that a kind of user's natural language based on man-machine interaction is asked The understanding method of topic.Li Chao [intelligent disease hospital guide and the research of medical answering method and application Dalian University of Technology, 2016] is utilized Big data technology, studied in disease hospital guide and disease knowledge automatic question answering, using convolutional neural networks model, and from Right language processing techniques construct online hospital guide's model, are changed in interrogation data and latent structure, there is provided class people, authoritative , abundant in content medical knowledge.[the medical community network health state of user inspection based on probability factor graph model such as Gong Jibing Survey method Journal of Computer Research and Development, 2013,50 (6):1285-1296] propose it is a kind of new based on space-time probability factor artwork The detection of network user's health status and prediction of type (temporal-spatial factor graph model, TS-FGM) Method, systematically discuss under the new situation a dynamic social network interior joint health state of user how to be detected and predicted with And which kind of degree is different factors have influence on to health state of user.
Above correlation technique is largely all based on conventional method and builds medical industry system, and medical text data is analyzed It is less, and the success of intelligent medical algorithm, such as predictive, intelligent answer application, largely dependence characteristics extraction and Data characterization.
The content of the invention
The purpose of the present invention is to be directed to above-mentioned the deficiencies in the prior art, there is provided a kind of medical treatment based on integrated convolutional encoding Question and answer Semantic Clustering method clusters to handle the question and answer of Chinese medically, mainly carries out nothing to the medical question and answer language material got The data characteristics analysis of depth characteristic study is supervised, feature self study is carried out to corpus data by more convolution autoencoder networks, The limitation for overcoming supervised feature space to define, knowledge, a kind of succinct and general sign of association, so as to effectively are obtained automatically Ground carries out clustering processing to medical question and answer language material.The specific integrated convolutional neural networks and autocoding utilized in deep learning Machine technology realizes the unsupervised Semantic Clustering of medical high dimension sparse data, and concrete scheme includes:1. the number of medical question and answer text Data preprocess;The selection of convolution kernel more than 2.;The fusion of the character representation of convolution more than 3.;4. build based on convolution and automatic coding machine IEHC(An Inception Convolutional Ensemble Auto-Encoders Model for Chinese Healthcare Questions Clustering, IEHC) model;5. a pair medical question and answer text carries out Semantic Clustering.
The purpose of the present invention can be achieved through the following technical solutions:
A kind of medical question and answer Semantic Clustering method based on integrated convolutional encoding, the basic ideas of methods described are:First, The question and answer language material of correlation is obtained from medical platform, these higher-dimensions, sparse, noise data are divided into 3 parts of data sets, ensure number According to balance, and carry out correlation pretreatment.Secondly as a convolution kernel can only obtain a kind of character representation, Duo Gejuan Product core can extract various features, therefore the selection of convolution kernel turns into key issue.Then, consider for the feature after more convolution kernels How efficient fusion is carried out to be utilized.Finally, to the feature after fusion and the correlation properties of combination automatic coding machine, structure Build a multiple convolution encoding model.The cluster accuracy rate of every batch of data is obtained according to the model of design, and with correlation technique Carry out Experimental comparison.Arameter optimization constantly is carried out to the model that the present invention designs in this course.
Medical question and answer Semantic Clustering method of the present invention based on integrated convolutional encoding comprises the following steps:
(1) the medical text crawled is pre-processed and carries out term vector modeling
Pretreatment in the step includes participle, removes stop words, because the corpus data amount of collection is too big, therefore to collecting Data carry out batch processing.The insertion modeling algorithm of the word based on neutral net that " term vector " is Google to be proposed in 2013 and Its supporting modeling tool word2vec.First, the text for having divided word is expected to fully enter term vector modeling tool, it is right It carries out term vector modeling, and the result of modeling is:What other occurred in above-mentioned text data in addition to stop words is each Word is uniformly mapped in the vector space that a dimension is fixed (size of the dimension can be manually adjusted).Input number According to the set inputData={ d for being full text1,d2,…,dm, each text is then the set d={ w of one group of vocabulary1, w2,…,wsize(t)}.After the completion of modeling, vocabulary is represented as the vector of a fixed dimension in mapping space, is expressed as w= (e1,e2,…,en)。
(2) based on convolutional neural networks and automatic coding machine structure IEHC models
In depth convolutional neural networks, the filter of use is more, and Spatial Dimension retains also better.But with depth The increase of degree, the complexity of network are consequently increased, therefore use optimization network structure to reduce the complexity of network in the present invention Degree, due to the present invention be to medical text carry out unsupervised clustering, therefore propose it is a kind of based on multiple convolution coding Medical question and answer clustering method.The use of small yardstick convolution kernel mainly has two big advantages:1) the training ginseng in whole network is controlled Number quantity, reduce the complexity of network;2) different size of convolution kernel carries on multiple dimensioned for input data progress feature Take.In the model of the present invention, the convolution kernel of different scale has been used input, different characteristic is represented to merge, merged Input of the result afterwards as coding, and convolution decoder is carried out, by loss function and stochastic gradient descent method come constantly right Model carries out arameter optimization, the feature of model extraction is reached best effects, obtains medical question and answer text cluster optimal effectiveness.
1st, basic model introduction
Typical convolutional neural networks are mainly by input layer, convolutional layer, down-sampling layer (pond layer), full articulamentum and output Layer composition.The input of convolutional neural networks is X, and i-th layer of Convolution is Hi, its specific convolution process isWherein, WiFor the weight vector of i-th layer of convolution kernel,Convolution is carried out for weights and the i-th -1 layer of feature Operation, it exports the bias vector b with this layeriIt is added, i-th layer of Convolution is finally given by activation primitive f ().
Down-sampling layer is generally followed by after convolutional layer, and down-sampling is carried out to characteristic pattern according to certain down-sampling rule. The function of down-sampling layer mainly has at 2 points:1) dimensionality reduction is carried out to characteristic pattern;2) Scale invariant of feature is kept to a certain extent Characteristic.By the alternating transmission of multiple convolutional layers and down-sampling layer, convolutional neural networks are by fully-connected network to for extraction Feature classified, obtain the probability distribution Y based on input.Convolutional neural networks are substantially to make original matrix by multiple The data conversion of level or dimensionality reduction, it is mapped to the mathematical modeling of a new feature representation.
The training objective of convolutional neural networks is to minimize the loss function L (W, b) of network.Input X passes through forward conduction The difference between desired value is calculated by loss function afterwards, is referred to as " residual error ".Common loss function has mean square error (Mean Squared Error, MSE) function, bear log-likelihood (Negative LogLikelihood, NLL) function etc..Training process In, the conventional optimization method of convolutional neural networks is gradient descent method.Residual error is declined by gradient carries out backpropagation, successively Update each layer of convolutional neural networks can training parameter (W and b).
Automatic coding machine mainly includes coding and decoding, and input is inputted into an encoder, obtains code table Show, then code is decoded, export relevant information.If the relevant information and input data similitude are higher, code is just It is the expression of input data.The related weights of model are adjusted by the error between input data and output data, according to volume Code device produces feature, trains successively, ultimately forms stable automatic coding machine.
2nd, the selection of convolution kernel
Because convolution kernel plays the effect of key in the process of convolution to text, therefore the selection of convolution kernel is to whole IEHC Model is most important.The present invention starts to realize the cluster result of single convolution kernel, by the diversity of cluster result and qualitative To obtain the convolution kernel for needing to combine, and finally carry out the weighted array processing of different convolution kernels.If convolution kernel integrate as K=1, 2,…kn}.The experimental result of each convolution kernel is obtained successively, passes through the diversity and matter of more different convolution kernel cluster results Amount, if diversity is larger, illustrate that the feature that they are obtained is different, the result for combining to obtain can be preferable.
During the selection of convolution kernel, consider that each convolution kernel obtains the otherness of Clustering Effect which convolution determined to choose Core.The otherness that experiment obtains is bigger, and the degree of association between each cluster result is lower, and clustering ensemble results of learning are better.Therefore this hair It is bright that the internuclear difference of different convolution is obtained by the mutual information that standardizes (Normalized Mutual Information, NMI) Degree:
Wherein, kaAnd kbRespectively different convolution kernel cluster result CaAnd CbIn number of clusters, n is whole number of data sets, nh,lTo be located at C simultaneouslyaH clusters and CbL clusters in number of data sets,To cluster CaH clusters in number of data sets,To cluster Cb L clusters in number of data sets.NMI(Ca,Cb) value is bigger, otherness is smaller between clustering device.Therefore planningization mutual information is turned Change:
Div(Ca,Cb)=1-NMI (Ca,Cb)
Div(Ca,Cb) be different convolution kernel cluster results otherness value, the value is smaller, embodies and associates and get between cluster device It is few.Choose the average and maximum top n convolution combination kerSet={ ks of otherness1,ks2,…,ksnIt is used as final experimental model Part, wherein ksn={ kD1,kD2,…kDi, kDiFor the convolution check figure in the combination.
Convolution kernel choose another evaluation index be SNMI, i.e., the cluster obtained with other clustering methods result NMI it is total The average value of sum, is specifically calculated as follows:
In summary two indices, the selection of convolution kernel are finally as follows:
3rd, the fusion of different convolution kernel character representations
The present invention proposes the fusion method of four kinds of different modes altogether.These four methods mainly can be summarized as feature rely on and Combinations of features mode, different convolution kernels are obtained how being combined with the presence or absence of dependence and these features between character representation, will divided Merge for onrelevant, onrelevant is serial, and association merges and associated serially.
a):Onrelevant merges (Irrelevant Coalescence, IC)
WithRepresent to need the convolution character representation collection merged, onrelevant merges into difference The internuclear character representation of convolution does not have any dependence, be all it is separate,This Invention is expressed as:
Wherein require that the obtained neuron number of each convolution kernel will be equal, i.e.,Cause To be characterized in the process that merges to these, the width when width represented after fusion is also equal to input is finally obtained.At this In the fusion process of model, we are merged using the method for summation, character pair that different convolution kernels obtain represent into Row is added, and specific formula represents as follows:
b):Onrelevant is serial (Irrelevant Serial, IS)
Onrelevant is serially that different convolution kernel character representations are carried out splicing to be together in series, and is closed wherein being also not present to rely on System, represent as follows:
The fusion feature that the operation finally gives represents that width is equal to its all convolution character representation width size sum, expands The big dimension of this layer, is expressed as
c):Association merges (Associated Coalescence, AC)
Following two methods are with above two kinds primary difference is that existing characteristics rely on, and the feature behind convolution kernel is with it One character representation is relevant, for different tasks, different mapping relations be present,AndRepresent as follows:
This method as IC, be all in terms of combinations of features superposition, width size is with the character representation size before fusion Equally.
d):Association is serial (Associated Serial, AS)
The association is mainly serially that feature relies on and feature splices and combines, and specifically represents as follows:
4th, block mold framework
Described by the above selection to convolution kernel, the fusion of character representation etc., this section will carry out one to block mold It is comprehensive to summarize, the different operating of model is entered from medical text input, to the cluster result finally given.
a):Pond and activation manipulation
The purpose of pond layer is to reduce the shift invariant of Feature Mapping.It is normally placed between two convolutional layers, this hair It is bright it is firstly placed on input layer after, obtain each Feature Mapping being originally inputted and be connected to its next convolutional layer.Then pass through Convolution operation enters line function activation, and calculating process is as follows:
Wherein, (m, n) is the neighboring units of (i, j),WithIt is l layers and l+1 layers respectively in (m, n), (i, j) position The neuron put, l are updated next layer of expression with having operated every time afterwards, andIt is expressed as the next of l Layer, so facilitate the expression after each level change.WithCorresponding weights and bias vector respectively on l layers, lead to Activation function call is crossed to arriveValue;
b):Dropout is operated
Dropout is operated primarily to preventing over-fitting, and its main thought is to interrupt at random in the training process Connection between layers, so it is prevented that the common adaptation situation of neutral net, its specific expression are as follows:
Wherein Bernoulli functions, it is in order to Probability p, the random vector for generating one 0,1Pass through vectorWith The neuron of last layerCarry out random interrupt and obtain next layer of neuron value
c):Different convolution Fusion Features and encoding operation
Above is the result that different convolution kernels are acted on simultaneously, in order to maximally utilize known conditions and correlation space, This step needs that obtained different characteristic is represented to merge, and fusion method has been given in step 3.Because the present invention is nothing Supervised learning extracts useful feature, make use of traditional self-editing ink recorder to carry out feature self study, therefore fusion results to obtaining Encoded:
Wherein,For kthcValue of the individual core on l layers (i, j) unit, by different convolution kernels on different neurons The feature learnt merges, and obtains next layer of character representationThese character representations are inputted encoded models In, encoded, obtained
d):Up-sampling and decoding operate
After pondization operation, the width of its input matrix can be changed into original half, and the study of self-editing ink recorder needs to input With output dimension it is equal come counting loss, therefore need to the result after coding carry out up-sampling recover before input sample it is big It is small, then the result after up-sampling is decoded:
Wherein, up-sampling operation is carried out to the feature after coding to obtain(i, j) be neuron elements and belong to (r, S), decoded decoding operates are carried out to the feature after sampling, finally gives output
e):Counting loss function
By obtaining the output of whole model after coding, the unsupervised self study character representation of self-editing ink recorder is by right Input is encoded, and then decoding and reconstituting input is output obtained in the previous step, so the loss function of whole learning network is L (X, Y), the calculating for self-editing ink recorder loss function have following two modes:
Wherein, x is input vector, and y is the last output vector of model, xnAnd ynRespectively wherein n-th value, L2(x,y) For L2Normal form.After obtaining initial loss function, by stochastic gradient descent method come constantly adjust the weights of whole model and Biasing so that model learning to feature reach best, finally carry out semanteme as the character representation for inputting medical question and answer text Cluster, obtains experimental result.
f):Evaluation criterion
Experimental result evaluation index has normalized mutual information (normalized mutual information, NMI), Adjusted Rand Index (ARI) and Average Accuracy (Average Precision, AR), are represented as follows respectively:
Wherein, a is the number of elements that same class is assigned in different clustering methods, and b is that distinct methods are assigned to not The quantity of similar middle element, CpFor the summation of all data elements, nij, ai, bjTo distinguish respective value in incidence matrix, RI is Rand Index indexs, E [RI] are the desired value of RI indexs, and max (RI) is the maximum in RI values, and ARI is Adjusted Rand Index indexs, by above several indexs come COMPREHENSIVE CALCULATING.AP is Average Accuracy, by the accuracy rate of each data set PiObtained with averaging.The experimental result being previously obtained is assessed by these evaluation indexes, differentiates different experiments method The effect quality in medical text semantic cluster.
(3) medical question and answer text cluster
Due to the present invention to be clustered to medical text question and answer data, therefore initial characteristicses are carried out by multi-kernel convolution and carried Take and optimize the framework of depth convolutional neural networks, carry out minimizing loss function and optimization according to unsupervised automatic coding machine Various parameters, the final useful feature that obtains represent and carry out text cluster.
The present invention compared with prior art, has the following advantages that and beneficial effect:
1st, the present invention handles the question and answer cluster of Chinese medically based on the clustering method of multiple convolution coding, to collecting Data carry out cluster analysis, overcome the limitation on supervised learning, and then the relevant disease that predictable patient may suffer from, from And automatic question answering medically can be realized well, targetedly answer patient and put question to and provide effective answer scheme;Should Method solves limitation existing for supervised learning, by inputting the higher-dimension, the sparse medical text data set that collect, reaches Obtain than traditional Unsupervised clustering algorithm and the current more jejune more preferable Clustering Effect of neural network clustering method.
2nd, by contrast, accuracy, stability and robustness suffer from for the present invention and traditional medical Unsupervised clustering algorithm Very big advantage;Compared with conventional method, technical scheme has following innovative point:First, by more in integrating Sample and quality carry out the selection of multiple convolution kernels;Second, multi-method fusion is carried out to the character representation after different convolution kernels; 3rd, it is applied to medical question and answer text with reference to convolutional neural networks and self-editing ink recorder.
Brief description of the drawings
Fig. 1 is a kind of flow chart of the medical question and answer Semantic Clustering method based on integrated convolutional encoding of the embodiment of the present invention.
Fig. 2 is a kind of Organization Chart of the medical question and answer Semantic Clustering method based on integrated convolutional encoding of the embodiment of the present invention.
Fig. 3 is the embodiment of the present invention and traditional Unsupervised clustering algorithm and different depth learning method on different pieces of information collection Adjusted Rand Index (ARI) and NMI contrast tables.
Fig. 4 is Clustering Effect comparison diagram of the embodiment of the present invention according to different characteristic fusion method.
Embodiment
With reference to embodiment and accompanying drawing, the present invention is described in further detail, but embodiments of the present invention are unlimited In this.
Embodiment:
A kind of medical question and answer Semantic Clustering method based on integrated convolutional encoding is present embodiments provided, methods described is based on Convolution encoding model is integrated to realize the Semantic Clustering to medical text data, the flow chart of methods described is as shown in figure 1, framework Figure is as shown in Fig. 2 comprise the following steps:
Step 1:Medical question and answer data set is obtained from medical platform, medical question and answer data set is pre-processed, and To input matrix;
Specifically, medical question and answer data set is pre-processed, i.e., medical question and answer data set is segmented, goes to disable Word, part-of-speech tagging, then the representation according to term vector is represented the medical question and answer data set formation matrix of input, is obtained defeated Enter matrix.
Step 2:Different convolution kernels is chosen to different input matrixes with convolutional encoding network and carries out kernel clustering, core is gathered Clustering result quality and diversity after class are calculated, and go out to represent the best n of text feature according to clustering result quality and diversity chose Individual convolution kernel;
Further, the higher result for obtaining kernel clustering of the value of clustering result quality is better, and its expression is as follows:
Wherein, K={ 1,2 ... knIt is convolution kernel collection,For kthjThe cluster result that individual convolution kernel obtains, SNMI kjIndividual convolution kernel after total NMI values of other convolution kernel size cluster results with being averaging;Obtained by standardizing mutual information NMI The internuclear difference degree of different convolution:
Wherein, kaAnd kbRespectively different convolution kernel cluster result CaAnd CbIn number of clusters, n is whole number of data sets, nh,lTo be located at C simultaneouslyaH clusters and CbL clusters in number of data sets,For cluster result CaH clusters in number of data sets,For Cluster result CbL clusters in number of data sets, NMI (Ca,Cb) value it is bigger, cluster device between otherness it is smaller;
The quality of kernel clustering is assessed using diversity after standardization mutual information NMI is changed:
Div(Ca,Cb)=1-NMI (Ca,Cb)
Div(Ca,Cb) be different convolution kernel cluster results otherness value, the value is smaller, embody cluster device between association It is fewer;
It is as follows with reference to clustering result quality and diversity evaluation standard, its final checkout result:
Wherein, Ker represents effective assessed value of the convolution kernel cluster result, and α represents clustering result quality weights, and 1- α are difference Property effect weights.Fig. 3 is the present embodiment and traditional Unsupervised clustering algorithm and different depth learning method on different pieces of information collection Adjusted Rand Index (ARI) and NMI contrast tables, wherein, K-means methods directly to be clustered to term vector, AE is cluster result after automatic coding machine is handled, AE+WD (auto-encoder with weight-decay Regularization it is) that weights regular terms, CNN+AE+SF (convolution AE with are added on self-editing ink recorder Single filter) convolution results that obtain for single core, CNN+AE+MF (convolution AE with multiple Filter it is) multilayer convolutional neural networks, DAE+MN (denoising auto-encoder with masking noise) is Cluster result after the self-editing ink recorder processing of noise reduction, IEHC+RKS1, IEHC+RKS2 are that multiple volumes are randomly selected in IEHC models The result of product core, IEHC+TKS1, IEHC+TKS2 are diversity and the best two convolution kernel experimental results of quality evaluation.
Step 3:The convolution kernel selected in step 2 is trained operation by convolutional neural networks respectively;
Further, the maximum preceding n convolution kernel combination kerSet={ ks of Ker values are chosen1,ks2,…,ksnCarry out mould Type training, n value are judged that it is 3 to take n here, wherein ks by user according to situationn={ kD1,kD2,…kDi, kDiRepresent the convolution kernel Convolution check figure in combination, feature initial representation, specific steps corresponding to different convolution kernels are respectively obtained after carrying out model training It is as follows:
a):Pond and activation manipulation
The purpose of pond layer is to reduce the shift invariant of Feature Mapping, after pond layer is placed on into input layer, is obtained former The each Feature Mapping for beginning to input is connected to its next convolutional layer, then enters line function by convolution operation and activates, calculating process It is as follows:
Wherein, (m, n) is the neighboring units of (i, j),WithIt is l layers and l+1 layers respectively in (m, n) and (i, j) Neuron on position, l are updated next layer of expression with having operated every time afterwards, andIt is expressed as under l One layer, so facilitate the expression after each level change,WithCorresponding weights and bias vector respectively on l layers, Obtained by activation primitiveValue;
b):Dropout is operated
Dropout operations are to prevent over-fitting, are to interrupt company between layers at random in the training process Connect, prevent the common adaptation situation of neutral net, its specific expression is as follows:
Wherein Bernoulli functions, it is in order to Probability p, the random vector for generating one 0,1Pass through vectorWith The neuron of last layerCarry out random interrupt and obtain next layer of neuron value
The step of the above two obtains the character representation result of different convolution kernels, and then they are merged, and space is carried out high The utilization of effect.
Step 4:Merge the character representation result of different convolution kernels;
The character representation result for merging different convolution kernels shares the fusion method of four kinds of different modes:
a):Onrelevant merges (Irrelevant Coalescence, IC)
Represent to need the convolution character representation collection merged, onrelevant, which merges, represents different The internuclear character representation of convolution does not have any dependence, is all separate, is expressed as:
Wherein require that the neuron number that each convolution kernel obtains will be equal, i.e.,It is because right These are characterized in the process merged, the width when width represented after fusion is also equal to input are finally obtained, in this model Fusion process in, merged using the method for summation, character pair that different convolution kernels obtain represents to be added, and has Body formula represents as follows:
b):Onrelevant is serial (Irrelevant Serial, IS)
Onrelevant is serially that different convolution kernel character representations are carried out splicing to be together in series, and is closed wherein being also not present to rely on System, represent as follows:
The fusion feature that the operation finally gives represents that width is equal to its all convolution character representation width size sum, should Method expands the dimension for changing layer;
c):Association merges (Associated Coalescence, AC)
Following two methods are with above two kinds primary difference is that existing characteristics rely on, and the feature behind convolution kernel is with it The expression of one feature is relevant, for different tasks, different mapping relations be present, represents as follows:
This method as IC, be all in terms of combinations of features superposition, width size is with the character representation size before fusion Equally;
d):Association is serial (Associated Serial, AS)
The association, which is serially characterized, to be relied on and feature splices and combines, and specifically represents as follows:
Fig. 4 is Clustering Effect comparison diagram of the present embodiment according to different characteristic fusion method, and four kinds of methods being capable of root Voluntarily selected according to the model of design, to different task and corpus, different fusion methods has different effects.
Step 5:Character representation result after fusion is inputted into self-editing ink recorder, input reconstruct training is carried out and obtains best features Represent;
Idiographic flow is as follows:
a):Different convolution character representations after fusion are performed the encoding operation
Feature self study is carried out using traditional self-editing ink recorder, to being encoded to the different convolution character representations after fusion Operation:
Wherein,For kthcValue of the individual convolution kernel on l layers (i, j) unit, by different convolution kernels in different nerves The feature learnt in member merges, and obtains next layer of character representationThese character representations are inputted encoded moulds In type, encoded, obtained
b):Up-sampling and decoding operate
After being operated by pondization, the width of its input matrix can be changed into original half, and the study of self-editing ink recorder needs Input with output dimension it is equal come counting loss, therefore need to the result after coding carry out up-sampling recover before input sample This size, then the result after up-sampling is decoded:
Wherein, up-sampling operation is carried out to the feature after coding to obtain(i, j) be neuron elements and belong to (r, S), decoding operate is carried out to the feature after sampling, finally gives output
c):Counting loss function
The loss function of whole learning network is L (X, Y), and the calculating for self-editing ink recorder loss function has following two sides Formula:
Wherein, x is input vector, and y is the last output vector of model, xnAnd ynRespectively wherein n-th value, L2(x,y) For L2Normal form, after obtaining initial loss function, by stochastic gradient descent method come constantly adjust the weights of whole model and Biasing so that model learning to feature reach best;
d):Model is evaluated
Experimental result evaluation index has normalized mutual information NMI, ARI and Average Accuracy AR, represents as follows respectively:
Wherein, a is the number of elements that same class is assigned in different clustering methods, and b is that distinct methods are assigned to not The quantity of similar middle element, CpFor the summation of all data elements, nij、ai、bjRespective value respectively in incidence matrix, RI are Rand Index indexs, E [RI] are the desired value of RI indexs, and max (RI) is the maximum in RI values, and ARI is Adjusted Rand Index indexs, by above several indexs come COMPREHENSIVE CALCULATING, AP is Average Accuracy, by the accuracy rate of each data set PiObtained with averaging.
Step 6:The best features that coding is obtained represent to be clustered, and obtain final medical text semantic cluster result.
It is described above, patent preferred embodiment only of the present invention, but the protection domain of patent of the present invention is not limited to This, any one skilled in the art is in the scope disclosed in patent of the present invention, according to the skill of patent of the present invention Art scheme and its patent of invention design are subject to equivalent substitution or change, belong to the protection domain of patent of the present invention.

Claims (6)

  1. A kind of 1. medical question and answer Semantic Clustering method based on integrated convolutional encoding, it is characterised in that methods described includes following Step:
    Step 1:Medical question and answer data set is obtained from medical platform, medical question and answer data set is pre-processed, and obtains defeated Enter matrix;
    Step 2:Different convolution kernels is chosen to different input matrixes with convolutional encoding network and carries out kernel clustering, after kernel clustering Clustering result quality and diversity calculated, go out to represent best n volume of text feature according to clustering result quality and diversity chose Product core;
    Step 3:The convolution kernel selected in step 2 is trained operation by convolutional neural networks respectively;
    Step 4:Merge the character representation result of different convolution kernels;
    Step 5:Character representation result after fusion is inputted into self-editing ink recorder, input reconstruct training is carried out and obtains best features table Show;
    Step 6:The best features that coding is obtained represent to be clustered, and obtain final medical text semantic cluster result.
  2. 2. a kind of medical question and answer Semantic Clustering method based on integrated convolutional encoding according to claim 1, its feature exist In:Being pre-processed to medical question and answer data set described in step 1, i.e., segmented to medical question and answer data set, go to disable Word, part-of-speech tagging, then the representation according to term vector is represented the medical question and answer data set formation matrix of input, is obtained defeated Enter matrix.
  3. 3. a kind of medical question and answer Semantic Clustering method based on integrated convolutional encoding according to claim 1, its feature exist In the higher result for obtaining kernel clustering of value of the clustering result quality in step 2 is better, and its expression is as follows:
    <mrow> <mi>S</mi> <mi>N</mi> <mi>M</mi> <mi>I</mi> <mrow> <mo>(</mo> <msub> <mi>C</mi> <msub> <mi>k</mi> <mi>j</mi> </msub> </msub> <mo>)</mo> </mrow> <mo>=</mo> <munderover> <mo>&amp;Sigma;</mo> <mrow> <mi>i</mi> <mo>=</mo> <mn>1</mn> <mo>,</mo> <mi>i</mi> <mo>&amp;NotEqual;</mo> <mi>j</mi> </mrow> <mi>n</mi> </munderover> <mi>N</mi> <mi>M</mi> <mi>I</mi> <mrow> <mo>(</mo> <msub> <mi>C</mi> <msub> <mi>k</mi> <mi>i</mi> </msub> </msub> <mo>,</mo> <msub> <mi>C</mi> <msub> <mi>k</mi> <mi>j</mi> </msub> </msub> <mo>)</mo> </mrow> </mrow>
    Wherein, K={ 1,2 ... knIt is convolution kernel collection,For kthjThe cluster result that individual convolution kernel obtains, SNMI are kthjIt is individual Convolution kernel after total NMI values of other convolution kernel size cluster results with being averaging;Difference is obtained by standardizing mutual information NMI The internuclear difference degree of convolution:
    <mrow> <mi>N</mi> <mi>M</mi> <mi>I</mi> <mrow> <mo>(</mo> <msub> <mi>C</mi> <mi>a</mi> </msub> <mo>,</mo> <msub> <mi>C</mi> <mi>b</mi> </msub> <mo>)</mo> </mrow> <mo>=</mo> <mfrac> <mrow> <munderover> <mo>&amp;Sigma;</mo> <mrow> <mi>h</mi> <mo>=</mo> <mn>1</mn> </mrow> <msub> <mi>k</mi> <mi>a</mi> </msub> </munderover> <munderover> <mo>&amp;Sigma;</mo> <mrow> <mi>l</mi> <mo>=</mo> <mn>1</mn> </mrow> <msub> <mi>k</mi> <mi>b</mi> </msub> </munderover> <msub> <mi>n</mi> <mrow> <mi>h</mi> <mo>,</mo> <mi>l</mi> </mrow> </msub> <mi>l</mi> <mi>o</mi> <mi>g</mi> <mrow> <mo>(</mo> <mfrac> <mrow> <mi>n</mi> <mo>&amp;CenterDot;</mo> <msub> <mi>n</mi> <mrow> <mi>h</mi> <mo>,</mo> <mi>l</mi> </mrow> </msub> </mrow> <mrow> <msubsup> <mi>n</mi> <mi>h</mi> <mi>a</mi> </msubsup> <mo>&amp;CenterDot;</mo> <msubsup> <mi>n</mi> <mi>l</mi> <mi>b</mi> </msubsup> </mrow> </mfrac> <mo>)</mo> </mrow> </mrow> <msqrt> <mrow> <mo>(</mo> <munderover> <mo>&amp;Sigma;</mo> <mrow> <mi>h</mi> <mo>=</mo> <mn>1</mn> </mrow> <msub> <mi>k</mi> <mi>a</mi> </msub> </munderover> <msubsup> <mi>n</mi> <mi>h</mi> <mi>a</mi> </msubsup> <mi>l</mi> <mi>o</mi> <mi>g</mi> <mfrac> <msubsup> <mi>n</mi> <mi>h</mi> <mi>a</mi> </msubsup> <mi>n</mi> </mfrac> <mo>)</mo> <mo>(</mo> <munderover> <mo>&amp;Sigma;</mo> <mrow> <mi>l</mi> <mo>=</mo> <mn>1</mn> </mrow> <msub> <mi>k</mi> <mi>b</mi> </msub> </munderover> <msubsup> <mi>n</mi> <mi>l</mi> <mi>b</mi> </msubsup> <mi>l</mi> <mi>o</mi> <mi>g</mi> <mfrac> <msubsup> <mi>n</mi> <mi>l</mi> <mi>b</mi> </msubsup> <mi>n</mi> </mfrac> <mo>)</mo> </mrow> </msqrt> </mfrac> </mrow>
    Wherein, kaAnd kbRespectively different convolution kernel cluster result CaAnd CbIn number of clusters, n is whole number of data sets, nh,lTo be same When be located at CaH clusters and CbL clusters in number of data sets,For cluster result CaH clusters in number of data sets,Tied for cluster Fruit CbL clusters in number of data sets, NMI (Ca,Cb) value it is bigger, cluster device between otherness it is smaller;
    The quality of kernel clustering is assessed using diversity after standardization mutual information NMI is changed:
    Div(Ca,Cb)=1-NMI (Ca,Cb)
    Div(Ca,Cb) be different convolution kernel cluster results otherness value, the value is smaller, embody cluster device between association get over It is few;
    It is as follows with reference to clustering result quality and diversity evaluation standard, its final checkout result:
    <mrow> <mi>K</mi> <mi>e</mi> <mi>r</mi> <mo>=</mo> <mi>&amp;alpha;</mi> <munderover> <mo>&amp;Sigma;</mo> <mrow> <mi>a</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>k</mi> </munderover> <mi>S</mi> <mi>N</mi> <mi>M</mi> <mi>I</mi> <mrow> <mo>(</mo> <msub> <mi>C</mi> <mi>a</mi> </msub> <mo>,</mo> <mi>O</mi> <mo>)</mo> </mrow> <mo>+</mo> <mrow> <mo>(</mo> <mn>1</mn> <mo>-</mo> <mi>&amp;alpha;</mi> <mo>)</mo> </mrow> <munderover> <mo>&amp;Sigma;</mo> <mrow> <mi>a</mi> <mo>&amp;NotEqual;</mo> <mi>b</mi> </mrow> <mi>k</mi> </munderover> <mi>D</mi> <mi>i</mi> <mi>v</mi> <mrow> <mo>(</mo> <msub> <mi>C</mi> <mi>a</mi> </msub> <mo>,</mo> <msub> <mi>C</mi> <mi>b</mi> </msub> <mo>)</mo> </mrow> </mrow>
    Wherein, Ker represents effective assessed value of the convolution kernel cluster result, and α represents clustering result quality weights, and 1- α are imitated for otherness Fruit weights.
  4. 4. a kind of medical question and answer Semantic Clustering method based on integrated convolutional encoding according to claim 3, its feature exist In:Choose the maximum preceding n convolution kernel combination kerSet={ ks of Ker values1,ks2,…,ksnCarry out model training, n value by User judges that it is 3 to take n here, wherein ks according to situationn={ kD1,kD2,…kDi, kDiRepresent the convolution kernel in convolution kernel combination Number, feature initial representation corresponding to different convolution kernels is respectively obtained after carrying out model training, is comprised the following steps that:
    a):Pond and activation manipulation
    The purpose of pond layer is to reduce the shift invariant of Feature Mapping, after pond layer is placed on into input layer, is obtained original defeated The each Feature Mapping entered is connected to its next convolutional layer, then enters line function by convolution operation and activates, calculating process is as follows:
    <mrow> <mo>&amp;ForAll;</mo> <mrow> <mo>(</mo> <mi>i</mi> <mo>,</mo> <mi>j</mi> <mo>)</mo> </mrow> <mo>&amp;Element;</mo> <msub> <mi>n</mi> <mrow> <mi>m</mi> <mo>,</mo> <mi>n</mi> </mrow> </msub> </mrow>
    <mrow> <msubsup> <mi>x</mi> <mrow> <mi>i</mi> <mo>,</mo> <mi>j</mi> </mrow> <mover> <mi>l</mi> <mo>^</mo> </mover> </msubsup> <mo>=</mo> <mi>max</mi> <mi> </mi> <mi>p</mi> <mi>o</mi> <mi>o</mi> <mi>l</mi> <mi>i</mi> <mi>n</mi> <mi>g</mi> <mrow> <mo>(</mo> <msubsup> <mi>x</mi> <mrow> <mi>m</mi> <mo>,</mo> <mi>n</mi> </mrow> <mi>l</mi> </msubsup> <mo>)</mo> </mrow> </mrow>
    <mrow> <msubsup> <mi>z</mi> <mrow> <mi>i</mi> <mo>,</mo> <mi>j</mi> <mo>,</mo> <msub> <mi>k</mi> <mi>c</mi> </msub> </mrow> <mi>l</mi> </msubsup> <mo>=</mo> <msubsup> <mi>w</mi> <msub> <mi>k</mi> <mi>c</mi> </msub> <mi>l</mi> </msubsup> <msubsup> <mi>x</mi> <mrow> <mi>i</mi> <mo>,</mo> <mi>j</mi> </mrow> <mi>l</mi> </msubsup> <mo>+</mo> <msubsup> <mi>b</mi> <msub> <mi>k</mi> <mi>c</mi> </msub> <mi>l</mi> </msubsup> </mrow>
    <mrow> <msubsup> <mover> <mi>z</mi> <mo>^</mo> </mover> <mrow> <mi>i</mi> <mo>,</mo> <mi>j</mi> <mo>,</mo> <msub> <mi>k</mi> <mi>c</mi> </msub> </mrow> <mover> <mi>l</mi> <mo>^</mo> </mover> </msubsup> <mo>=</mo> <msub> <mi>f</mi> <mi>l</mi> </msub> <mrow> <mo>(</mo> <msubsup> <mi>z</mi> <mrow> <mi>i</mi> <mo>,</mo> <mi>j</mi> <mo>,</mo> <msub> <mi>k</mi> <mi>c</mi> </msub> </mrow> <mi>l</mi> </msubsup> <mo>)</mo> </mrow> </mrow>
    Wherein, (m, n) is the neighboring units of (i, j),WithIt is l layers and l+1 layers respectively in (m, n) and (i, j) position On neuron, l with the expression that next layer is updated after having operated every time, andIt is expressed as the next of l Layer, so facilitate the expression after each level change,WithCorresponding weights and bias vector respectively on l layers, lead to Activation function call is crossed to arriveValue;
    b):Dropout is operated
    Dropout operations are to prevent over-fitting, are to interrupt connection between layers at random in the training process, are prevented The only common adaptation situation of neutral net, its specific expression are as follows:
    <mrow> <msubsup> <mi>r</mi> <mi>a</mi> <mi>l</mi> </msubsup> <mo>=</mo> <mi>B</mi> <mi>e</mi> <mi>r</mi> <mi>n</mi> <mi>o</mi> <mi>u</mi> <mi>l</mi> <mi>l</mi> <mi>i</mi> <mrow> <mo>(</mo> <mi>p</mi> <mo>)</mo> </mrow> </mrow>
    <mrow> <msubsup> <mi>z</mi> <mrow> <mi>i</mi> <mo>,</mo> <mi>j</mi> <mo>,</mo> <mi>k</mi> </mrow> <mover> <mi>l</mi> <mo>^</mo> </mover> </msubsup> <mo>=</mo> <msubsup> <mi>r</mi> <mi>a</mi> <mi>l</mi> </msubsup> <mo>*</mo> <msubsup> <mi>z</mi> <mrow> <mi>i</mi> <mo>,</mo> <mi>j</mi> <mo>,</mo> <msub> <mi>k</mi> <mi>c</mi> </msub> </mrow> <mi>l</mi> </msubsup> </mrow>
    Wherein Bernoulli functions, it is in order to Probability p, the random vector for generating one 0,1Pass through vectorAnd last layer NeuronCarry out random interrupt and obtain next layer of neuron value
    The step of the above two obtains the character representation result of different convolution kernels, and then they are merged, and space is carried out efficient Utilize.
  5. 5. a kind of medical question and answer Semantic Clustering method based on integrated convolutional encoding according to claim 4, its feature exist In the character representation result for merging different convolution kernels shares the fusion method of four kinds of different modes:
    a):Onrelevant merges
    Represent to need the convolution character representation collection merged, onrelevant, which merges, represents different convolution Internuclear character representation does not have any dependence, is all separate, is expressed as:
    <mrow> <msup> <mi>R</mi> <mrow> <mi>L</mi> <mo>+</mo> <mn>1</mn> </mrow> </msup> <mo>=</mo> <munderover> <mrow> <mi>I</mi> <mi>C</mi> </mrow> <mrow> <mi>j</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>n</mi> </munderover> <mrow> <mo>(</mo> <msubsup> <mi>R</mi> <mi>j</mi> <mi>L</mi> </msubsup> <mo>)</mo> </mrow> </mrow>
    Wherein require that the neuron number that each convolution kernel obtains will be equal, i.e.,Because to these It is characterized in the process merged, the width when width represented after fusion is also equal to input is finally obtained, in melting for this model During conjunction, merged using the method for summation, the character pair that different convolution kernels obtain represents to be added, specific public Formula represents as follows:
    <mrow> <msup> <mi>R</mi> <mrow> <mi>L</mi> <mo>+</mo> <mn>1</mn> </mrow> </msup> <mo>=</mo> <munderover> <mo>&amp;Sigma;</mo> <mrow> <mi>j</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>n</mi> </munderover> <mrow> <mo>(</mo> <msubsup> <mi>R</mi> <mi>j</mi> <mi>L</mi> </msubsup> <mo>)</mo> </mrow> </mrow>
    b):Onrelevant is serial
    Onrelevant is serially that different convolution kernel character representations are carried out splicing to be together in series, wherein dependence, table is also not present Show as follows:
    <mrow> <msup> <mi>R</mi> <mrow> <mi>L</mi> <mo>+</mo> <mn>1</mn> </mrow> </msup> <mo>=</mo> <munderover> <mrow> <mi>I</mi> <mi>S</mi> </mrow> <mrow> <mi>j</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>n</mi> </munderover> <mrow> <mo>(</mo> <msubsup> <mi>R</mi> <mi>j</mi> <mi>L</mi> </msubsup> <mo>)</mo> </mrow> </mrow>
    The fusion feature that the operation finally gives represents that width is equal to its all convolution character representation width size sum, this method Expand the dimension for changing layer;
    c):Association merges
    Following two methods are with above two kinds primary difference is that existing characteristics rely on, and the feature behind convolution kernel is upper one with it The expression of feature is relevant, for different tasks, different mapping relations be present, represents as follows:
    <mrow> <msup> <mi>R</mi> <mrow> <mi>L</mi> <mo>+</mo> <mn>1</mn> </mrow> </msup> <mo>=</mo> <munderover> <mrow> <mi>A</mi> <mi>C</mi> </mrow> <mrow> <mi>j</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>n</mi> </munderover> <mrow> <mo>(</mo> <msubsup> <mi>R</mi> <mi>j</mi> <mi>L</mi> </msubsup> <mo>)</mo> </mrow> </mrow>
    This method is all superposition, width size is with the character representation before fusion merging in terms of combinations of features with onrelevant Size is the same;
    d):Association is serial
    The association, which is serially characterized, to be relied on and feature splices and combines, and specifically represents as follows:
    <mrow> <msup> <mi>R</mi> <mrow> <mi>L</mi> <mo>+</mo> <mn>1</mn> </mrow> </msup> <mo>=</mo> <munderover> <mrow> <mi>A</mi> <mi>S</mi> </mrow> <mrow> <mi>j</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>n</mi> </munderover> <mrow> <mo>(</mo> <msubsup> <mi>R</mi> <mi>j</mi> <mi>L</mi> </msubsup> <mo>)</mo> </mrow> </mrow>
    Four kinds of methods can voluntarily select according to the model of design, to different task and corpus, different fusion sides Method has different effects.
  6. 6. a kind of medical question and answer Semantic Clustering method based on integrated convolutional encoding according to claim 5, its feature exist In the character representation result after fusion is inputted into self-editing ink recorder described in step 5, input reconstruct training is carried out and obtains optimal spy It is as follows to levy the idiographic flow represented:
    a):Different convolution character representations after fusion are performed the encoding operation
    Feature self study is carried out using traditional self-editing ink recorder, to carrying out coding behaviour to the different convolution character representations after fusion Make:
    <mrow> <msubsup> <mi>z</mi> <mrow> <mi>i</mi> <mo>,</mo> <mi>j</mi> <mo>,</mo> <mi>k</mi> </mrow> <mover> <mi>l</mi> <mo>^</mo> </mover> </msubsup> <mo>=</mo> <munderover> <mrow> <mi>M</mi> <mi>e</mi> <mi>r</mi> <mi>g</mi> <mi>e</mi> </mrow> <mrow> <mi>c</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>N</mi> </munderover> <mrow> <mo>(</mo> <msubsup> <mi>z</mi> <mrow> <mi>i</mi> <mo>,</mo> <mi>j</mi> <mo>,</mo> <msub> <mi>k</mi> <mi>c</mi> </msub> </mrow> <mi>l</mi> </msubsup> <mo>)</mo> </mrow> </mrow>
    <mrow> <msubsup> <mi>y</mi> <mrow> <mi>i</mi> <mo>,</mo> <mi>j</mi> <mo>,</mo> <mi>k</mi> </mrow> <mover> <mi>l</mi> <mo>^</mo> </mover> </msubsup> <mo>=</mo> <msubsup> <mi>f</mi> <mrow> <mi>e</mi> <mi>n</mi> <mi>c</mi> <mi>o</mi> <mi>d</mi> <mi>e</mi> <mi>d</mi> </mrow> <mi>l</mi> </msubsup> <mrow> <mo>(</mo> <msubsup> <mi>z</mi> <mrow> <mi>i</mi> <mo>,</mo> <mi>j</mi> <mo>,</mo> <mi>k</mi> </mrow> <mi>l</mi> </msubsup> <mo>)</mo> </mrow> </mrow>
    Wherein,For kthcValue of the individual convolution kernel on l layers (i, j) unit, by different convolution kernels on different neurons The feature learnt merges, and obtains next layer of character representationThese character representations are inputted in encoded models, Encoded, obtained
    b):Up-sampling and decoding operate
    After being operated by pondization, the width of its input matrix can be changed into original half, and the study of self-editing ink recorder needs to input With output dimension it is equal come counting loss, therefore need to the result after coding carry out up-sampling recover before input sample it is big It is small, then the result after up-sampling is decoded:
    <mrow> <msubsup> <mi>y</mi> <mrow> <mi>r</mi> <mo>,</mo> <mi>s</mi> <mo>,</mo> <mi>k</mi> </mrow> <mover> <mi>l</mi> <mo>^</mo> </mover> </msubsup> <mo>=</mo> <mi>u</mi> <mi>p</mi> <mi>s</mi> <mi>a</mi> <mi>m</mi> <mi>p</mi> <mi>l</mi> <mi>i</mi> <mi>n</mi> <mi>g</mi> <mrow> <mo>(</mo> <msubsup> <mi>y</mi> <mrow> <mi>i</mi> <mo>,</mo> <mi>j</mi> <mo>,</mo> <mi>k</mi> </mrow> <mi>l</mi> </msubsup> <mo>)</mo> </mrow> </mrow>
    <mrow> <mo>&amp;ForAll;</mo> <mrow> <mo>(</mo> <mi>i</mi> <mo>,</mo> <mi>j</mi> <mo>)</mo> </mrow> <mo>&amp;Element;</mo> <msub> <mi>n</mi> <mrow> <mi>r</mi> <mo>,</mo> <mi>s</mi> </mrow> </msub> </mrow>
    <mrow> <msubsup> <mi>y</mi> <mrow> <mi>r</mi> <mo>,</mo> <mi>s</mi> <mo>,</mo> <mi>k</mi> </mrow> <mover> <mi>l</mi> <mo>^</mo> </mover> </msubsup> <mo>=</mo> <msubsup> <mi>f</mi> <mrow> <mi>d</mi> <mi>e</mi> <mi>c</mi> <mi>o</mi> <mi>d</mi> <mi>e</mi> <mi>d</mi> </mrow> <mi>l</mi> </msubsup> <mrow> <mo>(</mo> <msubsup> <mi>y</mi> <mrow> <mi>r</mi> <mo>,</mo> <mi>s</mi> <mo>,</mo> <mi>k</mi> </mrow> <mi>l</mi> </msubsup> <mo>)</mo> </mrow> </mrow>
    Wherein, up-sampling operation is carried out to the feature after coding to obtain(i, j) is neuron elements and belonged to (r, s), right Feature after sampling carries out decoding operate, finally gives output
    c):Counting loss function
    The loss function of whole learning network is L (X, Y), and the calculating for self-editing ink recorder loss function has following two modes:
    <mrow> <mi>L</mi> <mrow> <mo>(</mo> <mi>x</mi> <mo>,</mo> <mi>y</mi> <mo>)</mo> </mrow> <mo>=</mo> <msub> <mi>L</mi> <mn>2</mn> </msub> <mrow> <mo>(</mo> <mi>x</mi> <mo>,</mo> <mi>y</mi> <mo>)</mo> </mrow> <mo>=</mo> <mfrac> <mn>1</mn> <mrow> <mn>2</mn> <mi>N</mi> </mrow> </mfrac> <mo>|</mo> <mo>|</mo> <mi>x</mi> <mo>-</mo> <mi>y</mi> <mo>|</mo> <msup> <mo>|</mo> <mn>2</mn> </msup> </mrow>
    <mrow> <mi>L</mi> <mrow> <mo>(</mo> <mi>x</mi> <mo>,</mo> <mi>y</mi> <mo>)</mo> </mrow> <mo>=</mo> <mo>-</mo> <mfrac> <mn>1</mn> <mi>N</mi> </mfrac> <munderover> <mo>&amp;Sigma;</mo> <mrow> <mi>n</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>N</mi> </munderover> <mo>&amp;lsqb;</mo> <msub> <mi>x</mi> <mi>n</mi> </msub> <msub> <mi>logy</mi> <mi>n</mi> </msub> <mo>+</mo> <mrow> <mo>(</mo> <mn>1</mn> <mo>-</mo> <msub> <mi>x</mi> <mi>n</mi> </msub> <mo>)</mo> </mrow> <mi>log</mi> <mrow> <mo>(</mo> <mn>1</mn> <mo>-</mo> <msub> <mi>y</mi> <mi>n</mi> </msub> <mo>)</mo> </mrow> <mo>&amp;rsqb;</mo> </mrow>
    Wherein, x is input vector, and y is the last output vector of model, xnAnd ynRespectively wherein n-th value, L2(x, y) is L2 Normal form, after obtaining initial loss function, the weights of whole model and biasing are constantly adjusted by stochastic gradient descent method, So that model learning to feature reach best;
    d):Model is evaluated
    Experimental result evaluation index has normalized mutual information NMI, ARI and Average Accuracy AR, represents as follows respectively:
    <mrow> <mi>R</mi> <mi>I</mi> <mo>=</mo> <mfrac> <mrow> <mi>a</mi> <mo>+</mo> <mi>b</mi> </mrow> <msub> <mi>C</mi> <mi>p</mi> </msub> </mfrac> <mo>=</mo> <munder> <mo>&amp;Sigma;</mo> <mrow> <mi>i</mi> <mi>j</mi> </mrow> </munder> <mfenced open = "(" close = ")"> <mtable> <mtr> <mtd> <msub> <mi>n</mi> <mrow> <mi>i</mi> <mi>j</mi> </mrow> </msub> </mtd> </mtr> <mtr> <mtd> <mn>2</mn> </mtd> </mtr> </mtable> </mfenced> </mrow>
    <mrow> <mi>E</mi> <mo>&amp;lsqb;</mo> <mi>R</mi> <mi>I</mi> <mo>&amp;rsqb;</mo> <mo>=</mo> <mo>&amp;lsqb;</mo> <munder> <mo>&amp;Sigma;</mo> <mi>i</mi> </munder> <mfenced open = "(" close = ")"> <mtable> <mtr> <mtd> <msub> <mi>a</mi> <mi>i</mi> </msub> </mtd> </mtr> <mtr> <mtd> <mn>2</mn> </mtd> </mtr> </mtable> </mfenced> <munder> <mo>&amp;Sigma;</mo> <mi>j</mi> </munder> <mfenced open = "(" close = ")"> <mtable> <mtr> <mtd> <msub> <mi>b</mi> <mi>j</mi> </msub> </mtd> </mtr> <mtr> <mtd> <mn>2</mn> </mtd> </mtr> </mtable> </mfenced> <mo>&amp;rsqb;</mo> <mo>/</mo> <msub> <mi>C</mi> <mi>p</mi> </msub> </mrow>
    <mrow> <mi>m</mi> <mi>a</mi> <mi>x</mi> <mrow> <mo>(</mo> <mi>R</mi> <mi>I</mi> <mo>)</mo> </mrow> <mo>=</mo> <mfrac> <mn>1</mn> <mn>2</mn> </mfrac> <mo>&amp;lsqb;</mo> <munder> <mo>&amp;Sigma;</mo> <mi>i</mi> </munder> <mfenced open = "(" close = ")"> <mtable> <mtr> <mtd> <msub> <mi>a</mi> <mi>i</mi> </msub> </mtd> </mtr> <mtr> <mtd> <mn>2</mn> </mtd> </mtr> </mtable> </mfenced> <mo>+</mo> <munder> <mo>&amp;Sigma;</mo> <mi>j</mi> </munder> <mfenced open = "(" close = ")"> <mtable> <mtr> <mtd> <msub> <mi>b</mi> <mi>j</mi> </msub> </mtd> </mtr> <mtr> <mtd> <mn>2</mn> </mtd> </mtr> </mtable> </mfenced> <mo>&amp;rsqb;</mo> </mrow>
    <mrow> <mi>A</mi> <mi>R</mi> <mi>I</mi> <mo>=</mo> <mfrac> <mrow> <mi>R</mi> <mi>I</mi> <mo>-</mo> <mi>E</mi> <mo>&amp;lsqb;</mo> <mi>R</mi> <mi>I</mi> <mo>&amp;rsqb;</mo> </mrow> <mrow> <mi>m</mi> <mi>a</mi> <mi>x</mi> <mrow> <mo>(</mo> <mi>R</mi> <mi>I</mi> <mo>)</mo> </mrow> <mo>-</mo> <mi>E</mi> <mo>&amp;lsqb;</mo> <mi>R</mi> <mi>I</mi> <mo>&amp;rsqb;</mo> </mrow> </mfrac> <mo>=</mo> <mfrac> <mrow> <munder> <mo>&amp;Sigma;</mo> <mrow> <mi>i</mi> <mi>j</mi> </mrow> </munder> <mfenced open = "(" close = ")"> <mtable> <mtr> <mtd> <msub> <mi>n</mi> <mrow> <mi>i</mi> <mi>j</mi> </mrow> </msub> </mtd> </mtr> <mtr> <mtd> <mn>2</mn> </mtd> </mtr> </mtable> </mfenced> <mo>-</mo> <mo>&amp;lsqb;</mo> <munder> <mo>&amp;Sigma;</mo> <mi>i</mi> </munder> <mfenced open = "(" close = ")"> <mtable> <mtr> <mtd> <msub> <mi>a</mi> <mi>i</mi> </msub> </mtd> </mtr> <mtr> <mtd> <mn>2</mn> </mtd> </mtr> </mtable> </mfenced> <munder> <mo>&amp;Sigma;</mo> <mi>j</mi> </munder> <mfenced open = "(" close = ")"> <mtable> <mtr> <mtd> <msub> <mi>b</mi> <mi>j</mi> </msub> </mtd> </mtr> <mtr> <mtd> <mn>2</mn> </mtd> </mtr> </mtable> </mfenced> <mo>&amp;rsqb;</mo> <mo>/</mo> <msub> <mi>C</mi> <mi>p</mi> </msub> </mrow> <mrow> <mfrac> <mn>1</mn> <mn>2</mn> </mfrac> <mo>&amp;lsqb;</mo> <munder> <mo>&amp;Sigma;</mo> <mi>i</mi> </munder> <mfenced open = "(" close = ")"> <mtable> <mtr> <mtd> <msub> <mi>a</mi> <mi>i</mi> </msub> </mtd> </mtr> <mtr> <mtd> <mn>2</mn> </mtd> </mtr> </mtable> </mfenced> <mo>+</mo> <munder> <mo>&amp;Sigma;</mo> <mi>j</mi> </munder> <mfenced open = "(" close = ")"> <mtable> <mtr> <mtd> <msub> <mi>b</mi> <mi>j</mi> </msub> </mtd> </mtr> <mtr> <mtd> <mn>2</mn> </mtd> </mtr> </mtable> </mfenced> <mo>&amp;rsqb;</mo> <mo>-</mo> <mo>&amp;lsqb;</mo> <munder> <mo>&amp;Sigma;</mo> <mi>i</mi> </munder> <mfenced open = "(" close = ")"> <mtable> <mtr> <mtd> <msub> <mi>a</mi> <mi>i</mi> </msub> </mtd> </mtr> <mtr> <mtd> <mn>2</mn> </mtd> </mtr> </mtable> </mfenced> <mo>+</mo> <munder> <mo>&amp;Sigma;</mo> <mi>j</mi> </munder> <mfenced open = "(" close = ")"> <mtable> <mtr> <mtd> <msub> <mi>b</mi> <mi>j</mi> </msub> </mtd> </mtr> <mtr> <mtd> <mn>2</mn> </mtd> </mtr> </mtable> </mfenced> <mo>&amp;rsqb;</mo> <mo>/</mo> <msub> <mi>C</mi> <mi>p</mi> </msub> </mrow> </mfrac> </mrow>
    <mrow> <mi>A</mi> <mi>P</mi> <mo>=</mo> <mfrac> <mrow> <munderover> <mo>&amp;Sigma;</mo> <mrow> <mi>i</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>n</mi> </munderover> <msub> <mi>P</mi> <mi>i</mi> </msub> </mrow> <mi>n</mi> </mfrac> </mrow>
    Wherein, a is the number of elements that same class is assigned in different clustering methods, and b is that distinct methods are assigned to inhomogeneity The quantity of middle element, CpFor the summation of all data elements, nij、ai、bjRespective value respectively in incidence matrix, RI are Rand Index indexs, E [RI] are the desired value of RI indexs, and max (RI) is the maximum in RI values, and ARI is Adjusted Rand Index indexs, by above several indexs come COMPREHENSIVE CALCULATING, AP is Average Accuracy, by the accuracy rate P of each data setiWith Averaging obtains.
CN201710723583.4A 2017-08-22 2017-08-22 Medical question-answer semantic clustering method based on integrated convolutional coding Active CN107516110B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710723583.4A CN107516110B (en) 2017-08-22 2017-08-22 Medical question-answer semantic clustering method based on integrated convolutional coding

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710723583.4A CN107516110B (en) 2017-08-22 2017-08-22 Medical question-answer semantic clustering method based on integrated convolutional coding

Publications (2)

Publication Number Publication Date
CN107516110A true CN107516110A (en) 2017-12-26
CN107516110B CN107516110B (en) 2020-02-18

Family

ID=60723274

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710723583.4A Active CN107516110B (en) 2017-08-22 2017-08-22 Medical question-answer semantic clustering method based on integrated convolutional coding

Country Status (1)

Country Link
CN (1) CN107516110B (en)

Cited By (31)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108333959A (en) * 2018-03-09 2018-07-27 清华大学 A kind of energy saving method of operating of locomotive based on convolutional neural networks model
CN108491431A (en) * 2018-02-09 2018-09-04 淮阴工学院 A kind of mixing recommendation method based on self-editing ink recorder and cluster
CN108806785A (en) * 2018-05-29 2018-11-13 四川长虹电器股份有限公司 A kind of diagnosis and treatment section office recommendation method and system based on convolutional neural networks
CN108846503A (en) * 2018-05-17 2018-11-20 电子科技大学 A kind of respiratory disease illness person-time dynamic prediction method neural network based
CN108899064A (en) * 2018-05-31 2018-11-27 平安医疗科技有限公司 Electronic health record generation method, device, computer equipment and storage medium
CN109271898A (en) * 2018-08-31 2019-01-25 电子科技大学 Solution cavity body recognizer based on optimization convolutional neural networks
CN109299274A (en) * 2018-11-07 2019-02-01 南京大学 A kind of natural scene Method for text detection based on full convolutional neural networks
CN109360658A (en) * 2018-11-01 2019-02-19 北京航空航天大学 A kind of the disease pattern method for digging and device of word-based vector model
CN109493931A (en) * 2018-10-25 2019-03-19 平安科技(深圳)有限公司 A kind of coding method of patient file, server and computer readable storage medium
CN109559761A (en) * 2018-12-21 2019-04-02 广东工业大学 A kind of risk of stroke prediction technique based on depth phonetic feature
CN109871531A (en) * 2019-01-04 2019-06-11 平安科技(深圳)有限公司 Hidden feature extracting method, device, computer equipment and storage medium
CN110134791A (en) * 2019-05-21 2019-08-16 北京泰迪熊移动科技有限公司 A kind of data processing method, electronic equipment and storage medium
CN110210350A (en) * 2019-05-22 2019-09-06 北京理工大学 A kind of quick parking space detection method based on deep learning
CN110222772A (en) * 2019-06-10 2019-09-10 浙江大学 A kind of medical image mark recommended method based on block rank Active Learning
CN110321929A (en) * 2019-06-04 2019-10-11 平安科技(深圳)有限公司 A kind of method, apparatus and storage medium for extracting text feature
CN110313894A (en) * 2019-04-15 2019-10-11 四川大学 Arrhythmia cordis sorting algorithm based on convolutional neural networks
CN110427627A (en) * 2019-08-02 2019-11-08 北京百度网讯科技有限公司 Task processing method and device based on semantic expressiveness model
CN110796251A (en) * 2019-10-28 2020-02-14 天津大学 Image compression optimization method based on convolutional neural network
CN111224677A (en) * 2018-11-27 2020-06-02 华为技术有限公司 Encoding method, decoding method and device
CN111598223A (en) * 2020-05-15 2020-08-28 天津科技大学 Network embedding method based on attribute and structure deep fusion and model thereof
CN111667029A (en) * 2020-07-09 2020-09-15 腾讯科技(深圳)有限公司 Clustering method, device, equipment and storage medium
CN112215267A (en) * 2020-09-25 2021-01-12 天津大学 Hyperspectral image-oriented depth space spectrum subspace clustering method
CN112559707A (en) * 2020-12-16 2021-03-26 四川智仟科技有限公司 Knowledge-driven customer service question and answer method
CN112992367A (en) * 2021-03-23 2021-06-18 崔剑虹 Smart medical interaction method based on big data and smart medical cloud computing system
CN113139061A (en) * 2021-05-14 2021-07-20 东北大学 Case feature extraction method based on word vector clustering
CN113159196A (en) * 2021-04-26 2021-07-23 云南大学 Software demand clustering method and system based on regular variation embedding
CN113284627A (en) * 2021-04-15 2021-08-20 北京交通大学 Medication recommendation method based on patient characterization learning
CN113449491A (en) * 2021-07-05 2021-09-28 思必驰科技股份有限公司 Pre-training framework for language understanding and generation with two-stage decoder
CN113611425A (en) * 2021-07-20 2021-11-05 上海齐网网络科技有限公司 Software definition-based intelligent regional medical treatment integrated database method and system
US20210375404A1 (en) * 2019-06-05 2021-12-02 Boe Technology Group Co., Ltd. Medical question-answering method, medical question-answering system, electronic device, and computer readable storage medium
CN116720523A (en) * 2023-04-19 2023-09-08 贵州轻工职业技术学院 Deep text clustering method and device based on multiple cores and storage medium

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104268532A (en) * 2014-09-30 2015-01-07 郑州轻工业学院 Method for efficiently processing large-scale image and video data under network environment
CN104715047A (en) * 2015-03-26 2015-06-17 浪潮集团有限公司 Social network data collecting and analyzing system
CN105469108A (en) * 2015-11-17 2016-04-06 深圳先进技术研究院 Clustering method, clustering system, clustering result evaluation method and clustering result evaluation system based on biological data
CN105677769A (en) * 2015-12-29 2016-06-15 广州神马移动信息科技有限公司 Keyword recommending method and system based on latent Dirichlet allocation (LDA) model
CN106294398A (en) * 2015-05-21 2017-01-04 富士通株式会社 Information processor and information processing method
CN106407931A (en) * 2016-09-19 2017-02-15 杭州电子科技大学 Novel deep convolution neural network moving vehicle detection method
CN106874367A (en) * 2016-12-30 2017-06-20 江苏号百信息服务有限公司 A kind of sampling distribution formula clustering method based on public sentiment platform
CN106874489A (en) * 2017-02-21 2017-06-20 烟台中科网络技术研究所 A kind of Lung neoplasm image block search method and device based on convolutional neural networks

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104268532A (en) * 2014-09-30 2015-01-07 郑州轻工业学院 Method for efficiently processing large-scale image and video data under network environment
CN104715047A (en) * 2015-03-26 2015-06-17 浪潮集团有限公司 Social network data collecting and analyzing system
CN106294398A (en) * 2015-05-21 2017-01-04 富士通株式会社 Information processor and information processing method
CN105469108A (en) * 2015-11-17 2016-04-06 深圳先进技术研究院 Clustering method, clustering system, clustering result evaluation method and clustering result evaluation system based on biological data
CN105677769A (en) * 2015-12-29 2016-06-15 广州神马移动信息科技有限公司 Keyword recommending method and system based on latent Dirichlet allocation (LDA) model
CN106407931A (en) * 2016-09-19 2017-02-15 杭州电子科技大学 Novel deep convolution neural network moving vehicle detection method
CN106874367A (en) * 2016-12-30 2017-06-20 江苏号百信息服务有限公司 A kind of sampling distribution formula clustering method based on public sentiment platform
CN106874489A (en) * 2017-02-21 2017-06-20 烟台中科网络技术研究所 A kind of Lung neoplasm image block search method and device based on convolutional neural networks

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
XIAOLI Z. FERN 等: "Cluster Ensemble Selection", 《HTTPS://DOI.ORG/10.1002/SAM.10008》 *
YONGGANG CAO 等: "AskHERMES: An online question answering system for complex clinical questions", 《JOURNAL OF BIOMEDICAL INFORMATICS》 *
ZHIWEN YU 等: "Adaptive Ensembling of Semi-Supervised Clustering Solutions", 《IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING》 *
申小敏 等: "基于卷积神经网络的大规模人脸聚类", 《广东工业大学学报》 *

Cited By (45)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108491431A (en) * 2018-02-09 2018-09-04 淮阴工学院 A kind of mixing recommendation method based on self-editing ink recorder and cluster
CN108491431B (en) * 2018-02-09 2021-09-17 淮阴工学院 Mixed recommendation method based on self-coding machine and clustering
CN108333959A (en) * 2018-03-09 2018-07-27 清华大学 A kind of energy saving method of operating of locomotive based on convolutional neural networks model
CN108846503A (en) * 2018-05-17 2018-11-20 电子科技大学 A kind of respiratory disease illness person-time dynamic prediction method neural network based
CN108806785A (en) * 2018-05-29 2018-11-13 四川长虹电器股份有限公司 A kind of diagnosis and treatment section office recommendation method and system based on convolutional neural networks
CN108899064A (en) * 2018-05-31 2018-11-27 平安医疗科技有限公司 Electronic health record generation method, device, computer equipment and storage medium
CN109271898A (en) * 2018-08-31 2019-01-25 电子科技大学 Solution cavity body recognizer based on optimization convolutional neural networks
CN109493931A (en) * 2018-10-25 2019-03-19 平安科技(深圳)有限公司 A kind of coding method of patient file, server and computer readable storage medium
CN109360658B (en) * 2018-11-01 2021-06-08 北京航空航天大学 Disease pattern mining method and device based on word vector model
CN109360658A (en) * 2018-11-01 2019-02-19 北京航空航天大学 A kind of the disease pattern method for digging and device of word-based vector model
CN109299274A (en) * 2018-11-07 2019-02-01 南京大学 A kind of natural scene Method for text detection based on full convolutional neural networks
CN109299274B (en) * 2018-11-07 2021-12-17 南京大学 Natural scene text detection method based on full convolution neural network
CN111224677B (en) * 2018-11-27 2021-10-15 华为技术有限公司 Encoding method, decoding method and device
CN111224677A (en) * 2018-11-27 2020-06-02 华为技术有限公司 Encoding method, decoding method and device
CN109559761A (en) * 2018-12-21 2019-04-02 广东工业大学 A kind of risk of stroke prediction technique based on depth phonetic feature
CN109871531A (en) * 2019-01-04 2019-06-11 平安科技(深圳)有限公司 Hidden feature extracting method, device, computer equipment and storage medium
CN110313894A (en) * 2019-04-15 2019-10-11 四川大学 Arrhythmia cordis sorting algorithm based on convolutional neural networks
CN110134791B (en) * 2019-05-21 2022-03-08 北京泰迪熊移动科技有限公司 Data processing method, electronic equipment and storage medium
CN110134791A (en) * 2019-05-21 2019-08-16 北京泰迪熊移动科技有限公司 A kind of data processing method, electronic equipment and storage medium
CN110210350A (en) * 2019-05-22 2019-09-06 北京理工大学 A kind of quick parking space detection method based on deep learning
CN110321929A (en) * 2019-06-04 2019-10-11 平安科技(深圳)有限公司 A kind of method, apparatus and storage medium for extracting text feature
US20210375404A1 (en) * 2019-06-05 2021-12-02 Boe Technology Group Co., Ltd. Medical question-answering method, medical question-answering system, electronic device, and computer readable storage medium
CN110222772A (en) * 2019-06-10 2019-09-10 浙江大学 A kind of medical image mark recommended method based on block rank Active Learning
CN110427627B (en) * 2019-08-02 2023-04-28 北京百度网讯科技有限公司 Task processing method and device based on semantic representation model
CN110427627A (en) * 2019-08-02 2019-11-08 北京百度网讯科技有限公司 Task processing method and device based on semantic expressiveness model
CN110796251A (en) * 2019-10-28 2020-02-14 天津大学 Image compression optimization method based on convolutional neural network
CN111598223B (en) * 2020-05-15 2023-10-24 天津科技大学 Network embedding method based on attribute and structure depth fusion and model thereof
CN111598223A (en) * 2020-05-15 2020-08-28 天津科技大学 Network embedding method based on attribute and structure deep fusion and model thereof
CN111667029B (en) * 2020-07-09 2023-11-10 腾讯科技(深圳)有限公司 Clustering method, device, equipment and storage medium
CN111667029A (en) * 2020-07-09 2020-09-15 腾讯科技(深圳)有限公司 Clustering method, device, equipment and storage medium
CN112215267B (en) * 2020-09-25 2022-11-01 天津大学 Hyperspectral image-oriented depth space spectrum subspace clustering method
CN112215267A (en) * 2020-09-25 2021-01-12 天津大学 Hyperspectral image-oriented depth space spectrum subspace clustering method
CN112559707A (en) * 2020-12-16 2021-03-26 四川智仟科技有限公司 Knowledge-driven customer service question and answer method
CN112992367B (en) * 2021-03-23 2021-09-28 微脉技术有限公司 Smart medical interaction method based on big data and smart medical cloud computing system
CN112992367A (en) * 2021-03-23 2021-06-18 崔剑虹 Smart medical interaction method based on big data and smart medical cloud computing system
CN113284627A (en) * 2021-04-15 2021-08-20 北京交通大学 Medication recommendation method based on patient characterization learning
CN113159196A (en) * 2021-04-26 2021-07-23 云南大学 Software demand clustering method and system based on regular variation embedding
CN113139061B (en) * 2021-05-14 2023-07-21 东北大学 Case feature extraction method based on word vector clustering
CN113139061A (en) * 2021-05-14 2021-07-20 东北大学 Case feature extraction method based on word vector clustering
CN113449491A (en) * 2021-07-05 2021-09-28 思必驰科技股份有限公司 Pre-training framework for language understanding and generation with two-stage decoder
CN113449491B (en) * 2021-07-05 2023-12-26 思必驰科技股份有限公司 Pre-training framework for language understanding and generation with two-stage decoder
CN113611425A (en) * 2021-07-20 2021-11-05 上海齐网网络科技有限公司 Software definition-based intelligent regional medical treatment integrated database method and system
CN113611425B (en) * 2021-07-20 2023-11-24 上海德衡数据科技有限公司 Method and system for intelligent regional medical integrated database based on software definition
CN116720523A (en) * 2023-04-19 2023-09-08 贵州轻工职业技术学院 Deep text clustering method and device based on multiple cores and storage medium
CN116720523B (en) * 2023-04-19 2024-02-06 贵州轻工职业技术学院 Deep text clustering method and device based on multiple cores and storage medium

Also Published As

Publication number Publication date
CN107516110B (en) 2020-02-18

Similar Documents

Publication Publication Date Title
CN107516110A (en) A kind of medical question and answer Semantic Clustering method based on integrated convolutional encoding
Heidari et al. The COVID-19 epidemic analysis and diagnosis using deep learning: A systematic literature review and future directions
JP2024503980A (en) Disease diagnosis prediction system based on graph neural network
CN109189925A (en) Term vector model based on mutual information and based on the file classification method of CNN
CN109117864A (en) Coronary heart disease risk prediction technique, model and system based on heterogeneous characteristic fusion
CN110347837A (en) A kind of unplanned Risk Forecast Method of being hospitalized again of cardiovascular disease
CN107562784A (en) Short text classification method based on ResLCNN models
CN110349676A (en) Timing physiological data classification method, device, storage medium and processor
Ahmad et al. Comparison between neural networks against decision tree in improving prediction accuracy for diabetes mellitus
CN109657947A (en) A kind of method for detecting abnormality towards enterprises &#39; industry classification
CN113553440B (en) Medical entity relationship extraction method based on hierarchical reasoning
CN109992779A (en) A kind of sentiment analysis method, apparatus, equipment and storage medium based on CNN
CN109165743A (en) A kind of semi-supervised network representation learning algorithm based on depth-compression self-encoding encoder
Dai et al. A deep inference learning framework for healthcare
Wang et al. Attention-based multi-instance neural network for medical diagnosis from incomplete and low quality data
CN110321563A (en) Text emotion analysis method based on mixing monitor model
Wang et al. Student physical health information management model under big data environment
CN106909938A (en) Viewing angle independence Activity recognition method based on deep learning network
CN109920551A (en) Autism children social action performance characteristic analysis system based on machine learning
Nayak et al. Evaluation of normalization methods on neuro-genetic models for stock index forecasting
CN109360658A (en) A kind of the disease pattern method for digging and device of word-based vector model
CN113380360B (en) Similar medical record retrieval method and system based on multi-mode medical record map
Mallick et al. A GRU neural network with attention mechanism for detection of risk situations on multimodal lifelog data
Alshraideh et al. Beyond the Scoreboard: A Machine Learning Investigation of Online Games’ Influence on Jordanian University Students’ Grades
Li et al. Sports Risk Prediction Model based on automatic encoder and convolutional neural network

Legal Events

Date Code Title Description
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant