CN107516110A

CN107516110A - A kind of medical question and answer Semantic Clustering method based on integrated convolutional encoding

Info

Publication number: CN107516110A
Application number: CN201710723583.4A
Authority: CN
Inventors: 余志文; 戴丹
Original assignee: South China University of Technology SCUT
Current assignee: South China University of Technology SCUT
Priority date: 2017-08-22
Filing date: 2017-08-22
Publication date: 2017-12-26
Anticipated expiration: 2037-08-22
Also published as: CN107516110B

Abstract

The invention discloses a kind of medical question and answer Semantic Clustering method based on integrated convolutional encoding, it is related to machine learning field, the described method comprises the following steps：Medical advice platform user question and answer language material gathers, the selection of convolution kernel, merges the character representation of different convolution kernels, and obtaining final data using self-editing ink recorder characterizes, and carries out medical advice question and answer Semantic Clustering.Compared with traditional deep learning method：This method extracts different features with different convolution kernels, the feature of extraction is more fully and diversified, and use different feature combination methods, the feature extracted is subjected to fusion expression, therefore generalization ability of the present invention is strong, and Semantic Clustering accuracy rate is high, can preferably help user to understand own situation based on this method, and doctor can be aided in carry out disease detection, there is very big application value to the automatically request-answering system for building medical treatment.

Description

A kind of medical question and answer Semantic Clustering method based on integrated convolutional encoding

Technical field

The present invention relates to Artificial smart field, especially machine learning field, and in particular to one kind is based on integrated The medical question and answer Semantic Clustering method of convolutional encoding.

Background technology

With the rapid development of Internet, the life style of people is gradually changed.Through survey, used when common When family body occurs uncomfortable, the people for having 90% can be to the information that correlation is searched on internet.Therefore internet is also changing Become medical treatment ecology.In internet medical treatment, online disease hospital guide is a critically important and crucial step.So that healthy related Medical field, there are many online disease question and answer websites.Patient by describe oneself experience, the detailed state of an illness, medication and Treatment etc. is exchanged and obtained the related nursing knowledge of disease with doctor.In these relevant disease question and answer, comprising The disease informations of many individual examples.If obtaining patient from these medical question and answer language materials to the sign of relevant disease, help to send out Dig and understand substantial amounts of useful information, it is possible to provide retrospective, disease forecasting can be carried out to patient, and in medical automatically request-answering system In be easier to understand patient consulting the problem of, to structure intelligent medical have great significance.

Medical language material text data has the characteristic such as noise, sparse, higher-dimension, isomery, incomplete, system sex bias, and together The symptom difference patient of sample has different describing modes, and conventional method is that suitable feature mode is chosen by expert, is used in combination Special mode is indicated, but insufficient supervision definition of this method to feature space yardstick, unsuitable extensive, is also missed The good opportunity of the new pattern of discovery and feature, this make it that conventional method is difficult to be characterized and modeled.Unsupervised representative learning, lead to The pattern and dependence crossed in automatic identification data, the limitation for overcoming supervised feature space to define of trying, association are a kind of Succinct and general sign, and the automatic acquisition of knowledge enables and more simply extracted when establishing grader or other fallout predictors Use information.And in recent years, deep learning is more and more in the application of image recognition, machine translation, intelligent answer etc., its Matter is to learn more accurately feature by building the machine learning model of multiple hidden layers and the training data of magnanimity, can more be carved The internal information of data is drawn, fails to understand that changeable, cross-cutting big data has significant advantage for analyzing unstructured, pattern, it is right These information knowledges have good expression.But in medical field, the correlation technique application of deep learning be not very extensively, It is not applied in the workflow of medical intelligent answer etc. reliably very much.

At present, national governments both domestic and external, major medical institutions and research institution have all put into pole in terms of intelligent medical treatment Big human and material resources and financial resources are studied.Abroad, it is relatively early to carry out having the U.S., adding and take for regional collaboration healthcare correlative study Greatly, the country such as Britain and Australia.Wherein, it is that Google is carried out to the electronic health record of patient based on drop to have the work represented Make an uproar self-editing ink recorder expression study come construction feature space, so as to patient carry out disease forecasting and provide correlation health refer to Lead.And the country has fourth little Hong [health resources shared mechanism and question and answer recommendation technique study Xian Electronics Science and Technology University, 2011] to exist On the basis of studying medical and health metadata standard, the step of realizing resource-sharing, including resource standardization and data set are provided Into.One has been inquired into using metadata as core, " being physically distributed, unify in logic ", the medical resource based on SOA architectures Shared platform, it is proposed that hierarchy type Sharing Model, and go out to send from five different views and analyze this shared platform.Liu Fang Deng [intelligent Answer System towards medical industry is studied and realizes microelectronics and computer, and 2012,11:95-98] for reality The intelligent Answer System of disease and diagnosis and treatment in existing medical information, proposes that a kind of user's natural language based on man-machine interaction is asked The understanding method of topic.Li Chao [intelligent disease hospital guide and the research of medical answering method and application Dalian University of Technology, 2016] is utilized Big data technology, studied in disease hospital guide and disease knowledge automatic question answering, using convolutional neural networks model, and from Right language processing techniques construct online hospital guide's model, are changed in interrogation data and latent structure, there is provided class people, authoritative , abundant in content medical knowledge.[the medical community network health state of user inspection based on probability factor graph model such as Gong Jibing Survey method Journal of Computer Research and Development, 2013,50 (6):1285-1296] propose it is a kind of new based on space-time probability factor artwork The detection of network user's health status and prediction of type (temporal-spatial factor graph model, TS-FGM) Method, systematically discuss under the new situation a dynamic social network interior joint health state of user how to be detected and predicted with And which kind of degree is different factors have influence on to health state of user.

Above correlation technique is largely all based on conventional method and builds medical industry system, and medical text data is analyzed It is less, and the success of intelligent medical algorithm, such as predictive, intelligent answer application, largely dependence characteristics extraction and Data characterization.

The content of the invention

The purpose of the present invention is to be directed to above-mentioned the deficiencies in the prior art, there is provided a kind of medical treatment based on integrated convolutional encoding Question and answer Semantic Clustering method clusters to handle the question and answer of Chinese medically, mainly carries out nothing to the medical question and answer language material got The data characteristics analysis of depth characteristic study is supervised, feature self study is carried out to corpus data by more convolution autoencoder networks, The limitation for overcoming supervised feature space to define, knowledge, a kind of succinct and general sign of association, so as to effectively are obtained automatically Ground carries out clustering processing to medical question and answer language material.The specific integrated convolutional neural networks and autocoding utilized in deep learning Machine technology realizes the unsupervised Semantic Clustering of medical high dimension sparse data, and concrete scheme includes：1. the number of medical question and answer text Data preprocess；The selection of convolution kernel more than 2.；The fusion of the character representation of convolution more than 3.；4. build based on convolution and automatic coding machine IEHC(An Inception Convolutional Ensemble Auto-Encoders Model for Chinese Healthcare Questions Clustering, IEHC) model；5. a pair medical question and answer text carries out Semantic Clustering.

The purpose of the present invention can be achieved through the following technical solutions：

A kind of medical question and answer Semantic Clustering method based on integrated convolutional encoding, the basic ideas of methods described are：First, The question and answer language material of correlation is obtained from medical platform, these higher-dimensions, sparse, noise data are divided into 3 parts of data sets, ensure number According to balance, and carry out correlation pretreatment.Secondly as a convolution kernel can only obtain a kind of character representation, Duo Gejuan Product core can extract various features, therefore the selection of convolution kernel turns into key issue.Then, consider for the feature after more convolution kernels How efficient fusion is carried out to be utilized.Finally, to the feature after fusion and the correlation properties of combination automatic coding machine, structure Build a multiple convolution encoding model.The cluster accuracy rate of every batch of data is obtained according to the model of design, and with correlation technique Carry out Experimental comparison.Arameter optimization constantly is carried out to the model that the present invention designs in this course.

Medical question and answer Semantic Clustering method of the present invention based on integrated convolutional encoding comprises the following steps：

(1) the medical text crawled is pre-processed and carries out term vector modeling

Pretreatment in the step includes participle, removes stop words, because the corpus data amount of collection is too big, therefore to collecting Data carry out batch processing.The insertion modeling algorithm of the word based on neutral net that " term vector " is Google to be proposed in 2013 and Its supporting modeling tool word2vec.First, the text for having divided word is expected to fully enter term vector modeling tool, it is right It carries out term vector modeling, and the result of modeling is：What other occurred in above-mentioned text data in addition to stop words is each Word is uniformly mapped in the vector space that a dimension is fixed (size of the dimension can be manually adjusted).Input number According to the set inputData={ d for being full text₁,d₂,…,d_m, each text is then the set d={ w of one group of vocabulary₁, w₂,…,w_size(t)}.After the completion of modeling, vocabulary is represented as the vector of a fixed dimension in mapping space, is expressed as w= (e₁,e₂,…,e_n)。

(2) based on convolutional neural networks and automatic coding machine structure IEHC models

In depth convolutional neural networks, the filter of use is more, and Spatial Dimension retains also better.But with depth The increase of degree, the complexity of network are consequently increased, therefore use optimization network structure to reduce the complexity of network in the present invention Degree, due to the present invention be to medical text carry out unsupervised clustering, therefore propose it is a kind of based on multiple convolution coding Medical question and answer clustering method.The use of small yardstick convolution kernel mainly has two big advantages:1) the training ginseng in whole network is controlled Number quantity, reduce the complexity of network；2) different size of convolution kernel carries on multiple dimensioned for input data progress feature Take.In the model of the present invention, the convolution kernel of different scale has been used input, different characteristic is represented to merge, merged Input of the result afterwards as coding, and convolution decoder is carried out, by loss function and stochastic gradient descent method come constantly right Model carries out arameter optimization, the feature of model extraction is reached best effects, obtains medical question and answer text cluster optimal effectiveness.

1st, basic model introduction

Typical convolutional neural networks are mainly by input layer, convolutional layer, down-sampling layer (pond layer), full articulamentum and output Layer composition.The input of convolutional neural networks is X, and i-th layer of Convolution is H_i, its specific convolution process isWherein, W_iFor the weight vector of i-th layer of convolution kernel,Convolution is carried out for weights and the i-th -1 layer of feature Operation, it exports the bias vector b with this layer_iIt is added, i-th layer of Convolution is finally given by activation primitive f ().

Down-sampling layer is generally followed by after convolutional layer, and down-sampling is carried out to characteristic pattern according to certain down-sampling rule. The function of down-sampling layer mainly has at 2 points:1) dimensionality reduction is carried out to characteristic pattern；2) Scale invariant of feature is kept to a certain extent Characteristic.By the alternating transmission of multiple convolutional layers and down-sampling layer, convolutional neural networks are by fully-connected network to for extraction Feature classified, obtain the probability distribution Y based on input.Convolutional neural networks are substantially to make original matrix by multiple The data conversion of level or dimensionality reduction, it is mapped to the mathematical modeling of a new feature representation.

The training objective of convolutional neural networks is to minimize the loss function L (W, b) of network.Input X passes through forward conduction The difference between desired value is calculated by loss function afterwards, is referred to as " residual error ".Common loss function has mean square error (Mean Squared Error, MSE) function, bear log-likelihood (Negative LogLikelihood, NLL) function etc..Training process In, the conventional optimization method of convolutional neural networks is gradient descent method.Residual error is declined by gradient carries out backpropagation, successively Update each layer of convolutional neural networks can training parameter (W and b).

Automatic coding machine mainly includes coding and decoding, and input is inputted into an encoder, obtains code table Show, then code is decoded, export relevant information.If the relevant information and input data similitude are higher, code is just It is the expression of input data.The related weights of model are adjusted by the error between input data and output data, according to volume Code device produces feature, trains successively, ultimately forms stable automatic coding machine.

2nd, the selection of convolution kernel

Because convolution kernel plays the effect of key in the process of convolution to text, therefore the selection of convolution kernel is to whole IEHC Model is most important.The present invention starts to realize the cluster result of single convolution kernel, by the diversity of cluster result and qualitative To obtain the convolution kernel for needing to combine, and finally carry out the weighted array processing of different convolution kernels.If convolution kernel integrate as K=1, 2,…k_n}.The experimental result of each convolution kernel is obtained successively, passes through the diversity and matter of more different convolution kernel cluster results Amount, if diversity is larger, illustrate that the feature that they are obtained is different, the result for combining to obtain can be preferable.

During the selection of convolution kernel, consider that each convolution kernel obtains the otherness of Clustering Effect which convolution determined to choose Core.The otherness that experiment obtains is bigger, and the degree of association between each cluster result is lower, and clustering ensemble results of learning are better.Therefore this hair It is bright that the internuclear difference of different convolution is obtained by the mutual information that standardizes (Normalized Mutual Information, NMI) Degree：

Wherein, k_aAnd k_bRespectively different convolution kernel cluster result C_aAnd C_bIn number of clusters, n is whole number of data sets, n_h,lTo be located at C simultaneously_aH clusters and C_bL clusters in number of data sets,To cluster C_aH clusters in number of data sets,To cluster C_b L clusters in number of data sets.NMI(C_a,C_b) value is bigger, otherness is smaller between clustering device.Therefore planningization mutual information is turned Change：

Div(C_a,C_b)=1-NMI (C_a,C_b)

Div(C_a,C_b) be different convolution kernel cluster results otherness value, the value is smaller, embodies and associates and get between cluster device It is few.Choose the average and maximum top n convolution combination kerSet={ ks of otherness₁,ks₂,…,ks_nIt is used as final experimental model Part, wherein ks_n={ k_D1,k_D2,…k_Di, k_DiFor the convolution check figure in the combination.

Convolution kernel choose another evaluation index be SNMI, i.e., the cluster obtained with other clustering methods result NMI it is total The average value of sum, is specifically calculated as follows：

In summary two indices, the selection of convolution kernel are finally as follows：

3rd, the fusion of different convolution kernel character representations

The present invention proposes the fusion method of four kinds of different modes altogether.These four methods mainly can be summarized as feature rely on and Combinations of features mode, different convolution kernels are obtained how being combined with the presence or absence of dependence and these features between character representation, will divided Merge for onrelevant, onrelevant is serial, and association merges and associated serially.

a)：Onrelevant merges (Irrelevant Coalescence, IC)

WithRepresent to need the convolution character representation collection merged, onrelevant merges into difference The internuclear character representation of convolution does not have any dependence, be all it is separate,This Invention is expressed as：

Wherein require that the obtained neuron number of each convolution kernel will be equal, i.e.,Cause To be characterized in the process that merges to these, the width when width represented after fusion is also equal to input is finally obtained.At this In the fusion process of model, we are merged using the method for summation, character pair that different convolution kernels obtain represent into Row is added, and specific formula represents as follows：

b)：Onrelevant is serial (Irrelevant Serial, IS)

Onrelevant is serially that different convolution kernel character representations are carried out splicing to be together in series, and is closed wherein being also not present to rely on System, represent as follows：

The fusion feature that the operation finally gives represents that width is equal to its all convolution character representation width size sum, expands The big dimension of this layer, is expressed as

c)：Association merges (Associated Coalescence, AC)

Following two methods are with above two kinds primary difference is that existing characteristics rely on, and the feature behind convolution kernel is with it One character representation is relevant, for different tasks, different mapping relations be present,AndRepresent as follows：

This method as IC, be all in terms of combinations of features superposition, width size is with the character representation size before fusion Equally.

d)：Association is serial (Associated Serial, AS)

The association is mainly serially that feature relies on and feature splices and combines, and specifically represents as follows：

4th, block mold framework

Described by the above selection to convolution kernel, the fusion of character representation etc., this section will carry out one to block mold It is comprehensive to summarize, the different operating of model is entered from medical text input, to the cluster result finally given.

a)：Pond and activation manipulation

The purpose of pond layer is to reduce the shift invariant of Feature Mapping.It is normally placed between two convolutional layers, this hair It is bright it is firstly placed on input layer after, obtain each Feature Mapping being originally inputted and be connected to its next convolutional layer.Then pass through Convolution operation enters line function activation, and calculating process is as follows：

Wherein, (m, n) is the neighboring units of (i, j),WithIt is l layers and l+1 layers respectively in (m, n), (i, j) position The neuron put, l are updated next layer of expression with having operated every time afterwards, andIt is expressed as the next of l Layer, so facilitate the expression after each level change.WithCorresponding weights and bias vector respectively on l layers, lead to Activation function call is crossed to arriveValue；

b)：Dropout is operated

Dropout is operated primarily to preventing over-fitting, and its main thought is to interrupt at random in the training process Connection between layers, so it is prevented that the common adaptation situation of neutral net, its specific expression are as follows：

Wherein Bernoulli functions, it is in order to Probability p, the random vector for generating one 0,1Pass through vectorWith The neuron of last layerCarry out random interrupt and obtain next layer of neuron value

c)：Different convolution Fusion Features and encoding operation

Above is the result that different convolution kernels are acted on simultaneously, in order to maximally utilize known conditions and correlation space, This step needs that obtained different characteristic is represented to merge, and fusion method has been given in step 3.Because the present invention is nothing Supervised learning extracts useful feature, make use of traditional self-editing ink recorder to carry out feature self study, therefore fusion results to obtaining Encoded：

Wherein,For kth_cValue of the individual core on l layers (i, j) unit, by different convolution kernels on different neurons The feature learnt merges, and obtains next layer of character representationThese character representations are inputted encoded models In, encoded, obtained

d)：Up-sampling and decoding operate

After pondization operation, the width of its input matrix can be changed into original half, and the study of self-editing ink recorder needs to input With output dimension it is equal come counting loss, therefore need to the result after coding carry out up-sampling recover before input sample it is big It is small, then the result after up-sampling is decoded：

Wherein, up-sampling operation is carried out to the feature after coding to obtain(i, j) be neuron elements and belong to (r, S), decoded decoding operates are carried out to the feature after sampling, finally gives output

e)：Counting loss function

By obtaining the output of whole model after coding, the unsupervised self study character representation of self-editing ink recorder is by right Input is encoded, and then decoding and reconstituting input is output obtained in the previous step, so the loss function of whole learning network is L (X, Y), the calculating for self-editing ink recorder loss function have following two modes：

Wherein, x is input vector, and y is the last output vector of model, x_nAnd y_nRespectively wherein n-th value, L₂(x,y) For L₂Normal form.After obtaining initial loss function, by stochastic gradient descent method come constantly adjust the weights of whole model and Biasing so that model learning to feature reach best, finally carry out semanteme as the character representation for inputting medical question and answer text Cluster, obtains experimental result.

f)：Evaluation criterion

Experimental result evaluation index has normalized mutual information (normalized mutual information, NMI), Adjusted Rand Index (ARI) and Average Accuracy (Average Precision, AR), are represented as follows respectively：

Wherein, a is the number of elements that same class is assigned in different clustering methods, and b is that distinct methods are assigned to not The quantity of similar middle element, C_pFor the summation of all data elements, n_ij, a_i, b_jTo distinguish respective value in incidence matrix, RI is Rand Index indexs, E [RI] are the desired value of RI indexs, and max (RI) is the maximum in RI values, and ARI is Adjusted Rand Index indexs, by above several indexs come COMPREHENSIVE CALCULATING.AP is Average Accuracy, by the accuracy rate of each data set P_iObtained with averaging.The experimental result being previously obtained is assessed by these evaluation indexes, differentiates different experiments method The effect quality in medical text semantic cluster.

(3) medical question and answer text cluster

Due to the present invention to be clustered to medical text question and answer data, therefore initial characteristicses are carried out by multi-kernel convolution and carried Take and optimize the framework of depth convolutional neural networks, carry out minimizing loss function and optimization according to unsupervised automatic coding machine Various parameters, the final useful feature that obtains represent and carry out text cluster.

The present invention compared with prior art, has the following advantages that and beneficial effect：

1st, the present invention handles the question and answer cluster of Chinese medically based on the clustering method of multiple convolution coding, to collecting Data carry out cluster analysis, overcome the limitation on supervised learning, and then the relevant disease that predictable patient may suffer from, from And automatic question answering medically can be realized well, targetedly answer patient and put question to and provide effective answer scheme；Should Method solves limitation existing for supervised learning, by inputting the higher-dimension, the sparse medical text data set that collect, reaches Obtain than traditional Unsupervised clustering algorithm and the current more jejune more preferable Clustering Effect of neural network clustering method.

2nd, by contrast, accuracy, stability and robustness suffer from for the present invention and traditional medical Unsupervised clustering algorithm Very big advantage；Compared with conventional method, technical scheme has following innovative point：First, by more in integrating Sample and quality carry out the selection of multiple convolution kernels；Second, multi-method fusion is carried out to the character representation after different convolution kernels； 3rd, it is applied to medical question and answer text with reference to convolutional neural networks and self-editing ink recorder.

Brief description of the drawings

Fig. 1 is a kind of flow chart of the medical question and answer Semantic Clustering method based on integrated convolutional encoding of the embodiment of the present invention.

Fig. 2 is a kind of Organization Chart of the medical question and answer Semantic Clustering method based on integrated convolutional encoding of the embodiment of the present invention.

Fig. 3 is the embodiment of the present invention and traditional Unsupervised clustering algorithm and different depth learning method on different pieces of information collection Adjusted Rand Index (ARI) and NMI contrast tables.

Fig. 4 is Clustering Effect comparison diagram of the embodiment of the present invention according to different characteristic fusion method.

Embodiment

With reference to embodiment and accompanying drawing, the present invention is described in further detail, but embodiments of the present invention are unlimited In this.

Embodiment：

A kind of medical question and answer Semantic Clustering method based on integrated convolutional encoding is present embodiments provided, methods described is based on Convolution encoding model is integrated to realize the Semantic Clustering to medical text data, the flow chart of methods described is as shown in figure 1, framework Figure is as shown in Fig. 2 comprise the following steps：

Step 1：Medical question and answer data set is obtained from medical platform, medical question and answer data set is pre-processed, and To input matrix；

Specifically, medical question and answer data set is pre-processed, i.e., medical question and answer data set is segmented, goes to disable Word, part-of-speech tagging, then the representation according to term vector is represented the medical question and answer data set formation matrix of input, is obtained defeated Enter matrix.

Step 2：Different convolution kernels is chosen to different input matrixes with convolutional encoding network and carries out kernel clustering, core is gathered Clustering result quality and diversity after class are calculated, and go out to represent the best n of text feature according to clustering result quality and diversity chose Individual convolution kernel；

Further, the higher result for obtaining kernel clustering of the value of clustering result quality is better, and its expression is as follows：

Wherein, K={ 1,2 ... k_nIt is convolution kernel collection,For kth_jThe cluster result that individual convolution kernel obtains, SNMI k_jIndividual convolution kernel after total NMI values of other convolution kernel size cluster results with being averaging；Obtained by standardizing mutual information NMI The internuclear difference degree of different convolution：

Wherein, k_aAnd k_bRespectively different convolution kernel cluster result C_aAnd C_bIn number of clusters, n is whole number of data sets, n_h,lTo be located at C simultaneously_aH clusters and C_bL clusters in number of data sets,For cluster result C_aH clusters in number of data sets,For Cluster result C_bL clusters in number of data sets, NMI (C_a,C_b) value it is bigger, cluster device between otherness it is smaller；

The quality of kernel clustering is assessed using diversity after standardization mutual information NMI is changed：

Div(C_a,C_b)=1-NMI (C_a,C_b)

Div(C_a,C_b) be different convolution kernel cluster results otherness value, the value is smaller, embody cluster device between association It is fewer；

It is as follows with reference to clustering result quality and diversity evaluation standard, its final checkout result：

Wherein, Ker represents effective assessed value of the convolution kernel cluster result, and α represents clustering result quality weights, and 1- α are difference Property effect weights.Fig. 3 is the present embodiment and traditional Unsupervised clustering algorithm and different depth learning method on different pieces of information collection Adjusted Rand Index (ARI) and NMI contrast tables, wherein, K-means methods directly to be clustered to term vector, AE is cluster result after automatic coding machine is handled, AE+WD (auto-encoder with weight-decay Regularization it is) that weights regular terms, CNN+AE+SF (convolution AE with are added on self-editing ink recorder Single filter) convolution results that obtain for single core, CNN+AE+MF (convolution AE with multiple Filter it is) multilayer convolutional neural networks, DAE+MN (denoising auto-encoder with masking noise) is Cluster result after the self-editing ink recorder processing of noise reduction, IEHC+RKS1, IEHC+RKS2 are that multiple volumes are randomly selected in IEHC models The result of product core, IEHC+TKS1, IEHC+TKS2 are diversity and the best two convolution kernel experimental results of quality evaluation.

Step 3：The convolution kernel selected in step 2 is trained operation by convolutional neural networks respectively；

Further, the maximum preceding n convolution kernel combination kerSet={ ks of Ker values are chosen₁,ks₂,…,ks_nCarry out mould Type training, n value are judged that it is 3 to take n here, wherein ks by user according to situation_n={ k_D1,k_D2,…k_Di, k_DiRepresent the convolution kernel Convolution check figure in combination, feature initial representation, specific steps corresponding to different convolution kernels are respectively obtained after carrying out model training It is as follows：

a)：Pond and activation manipulation

The purpose of pond layer is to reduce the shift invariant of Feature Mapping, after pond layer is placed on into input layer, is obtained former The each Feature Mapping for beginning to input is connected to its next convolutional layer, then enters line function by convolution operation and activates, calculating process It is as follows：

Wherein, (m, n) is the neighboring units of (i, j),WithIt is l layers and l+1 layers respectively in (m, n) and (i, j) Neuron on position, l are updated next layer of expression with having operated every time afterwards, andIt is expressed as under l One layer, so facilitate the expression after each level change,WithCorresponding weights and bias vector respectively on l layers, Obtained by activation primitiveValue；

b)：Dropout is operated

Dropout operations are to prevent over-fitting, are to interrupt company between layers at random in the training process Connect, prevent the common adaptation situation of neutral net, its specific expression is as follows：

The step of the above two obtains the character representation result of different convolution kernels, and then they are merged, and space is carried out high The utilization of effect.

Step 4：Merge the character representation result of different convolution kernels；

The character representation result for merging different convolution kernels shares the fusion method of four kinds of different modes：

a)：Onrelevant merges (Irrelevant Coalescence, IC)

Represent to need the convolution character representation collection merged, onrelevant, which merges, represents different The internuclear character representation of convolution does not have any dependence, is all separate, is expressed as：

Wherein require that the neuron number that each convolution kernel obtains will be equal, i.e.,It is because right These are characterized in the process merged, the width when width represented after fusion is also equal to input are finally obtained, in this model Fusion process in, merged using the method for summation, character pair that different convolution kernels obtain represents to be added, and has Body formula represents as follows：

b)：Onrelevant is serial (Irrelevant Serial, IS)

The fusion feature that the operation finally gives represents that width is equal to its all convolution character representation width size sum, should Method expands the dimension for changing layer；

c)：Association merges (Associated Coalescence, AC)

Following two methods are with above two kinds primary difference is that existing characteristics rely on, and the feature behind convolution kernel is with it The expression of one feature is relevant, for different tasks, different mapping relations be present, represents as follows：

This method as IC, be all in terms of combinations of features superposition, width size is with the character representation size before fusion Equally；

d)：Association is serial (Associated Serial, AS)

The association, which is serially characterized, to be relied on and feature splices and combines, and specifically represents as follows：

Fig. 4 is Clustering Effect comparison diagram of the present embodiment according to different characteristic fusion method, and four kinds of methods being capable of root Voluntarily selected according to the model of design, to different task and corpus, different fusion methods has different effects.

Step 5：Character representation result after fusion is inputted into self-editing ink recorder, input reconstruct training is carried out and obtains best features Represent；

Idiographic flow is as follows：

a)：Different convolution character representations after fusion are performed the encoding operation

Feature self study is carried out using traditional self-editing ink recorder, to being encoded to the different convolution character representations after fusion Operation：

Wherein,For kth_cValue of the individual convolution kernel on l layers (i, j) unit, by different convolution kernels in different nerves The feature learnt in member merges, and obtains next layer of character representationThese character representations are inputted encoded moulds In type, encoded, obtained

b)：Up-sampling and decoding operate

After being operated by pondization, the width of its input matrix can be changed into original half, and the study of self-editing ink recorder needs Input with output dimension it is equal come counting loss, therefore need to the result after coding carry out up-sampling recover before input sample This size, then the result after up-sampling is decoded：

Wherein, up-sampling operation is carried out to the feature after coding to obtain(i, j) be neuron elements and belong to (r, S), decoding operate is carried out to the feature after sampling, finally gives output

c)：Counting loss function

The loss function of whole learning network is L (X, Y), and the calculating for self-editing ink recorder loss function has following two sides Formula：

Wherein, x is input vector, and y is the last output vector of model, x_nAnd y_nRespectively wherein n-th value, L₂(x,y) For L₂Normal form, after obtaining initial loss function, by stochastic gradient descent method come constantly adjust the weights of whole model and Biasing so that model learning to feature reach best；

d)：Model is evaluated

Experimental result evaluation index has normalized mutual information NMI, ARI and Average Accuracy AR, represents as follows respectively：

Wherein, a is the number of elements that same class is assigned in different clustering methods, and b is that distinct methods are assigned to not The quantity of similar middle element, C_pFor the summation of all data elements, n_ij、a_i、b_jRespective value respectively in incidence matrix, RI are Rand Index indexs, E [RI] are the desired value of RI indexs, and max (RI) is the maximum in RI values, and ARI is Adjusted Rand Index indexs, by above several indexs come COMPREHENSIVE CALCULATING, AP is Average Accuracy, by the accuracy rate of each data set P_iObtained with averaging.

Step 6：The best features that coding is obtained represent to be clustered, and obtain final medical text semantic cluster result.

It is described above, patent preferred embodiment only of the present invention, but the protection domain of patent of the present invention is not limited to This, any one skilled in the art is in the scope disclosed in patent of the present invention, according to the skill of patent of the present invention Art scheme and its patent of invention design are subject to equivalent substitution or change, belong to the protection domain of patent of the present invention.

Claims

A kind of 1. medical question and answer Semantic Clustering method based on integrated convolutional encoding, it is characterised in that methods described includes following Step：

Step 1：Medical question and answer data set is obtained from medical platform, medical question and answer data set is pre-processed, and obtains defeated Enter matrix；

Step 2：Different convolution kernels is chosen to different input matrixes with convolutional encoding network and carries out kernel clustering, after kernel clustering Clustering result quality and diversity calculated, go out to represent best n volume of text feature according to clustering result quality and diversity chose Product core；

Step 3：The convolution kernel selected in step 2 is trained operation by convolutional neural networks respectively；

Step 4：Merge the character representation result of different convolution kernels；

Step 5：Character representation result after fusion is inputted into self-editing ink recorder, input reconstruct training is carried out and obtains best features table Show；

Step 6：The best features that coding is obtained represent to be clustered, and obtain final medical text semantic cluster result.
2. a kind of medical question and answer Semantic Clustering method based on integrated convolutional encoding according to claim 1, its feature exist In：Being pre-processed to medical question and answer data set described in step 1, i.e., segmented to medical question and answer data set, go to disable Word, part-of-speech tagging, then the representation according to term vector is represented the medical question and answer data set formation matrix of input, is obtained defeated Enter matrix.
3. a kind of medical question and answer Semantic Clustering method based on integrated convolutional encoding according to claim 1, its feature exist In the higher result for obtaining kernel clustering of value of the clustering result quality in step 2 is better, and its expression is as follows：

<mrow> <mi>S</mi> <mi>N</mi> <mi>M</mi> <mi>I</mi> <mrow> <mo>(</mo> <msub> <mi>C</mi> <msub> <mi>k</mi> <mi>j</mi> </msub> </msub> <mo>)</mo> </mrow> <mo>=</mo> <munderover> <mo>&Sigma;</mo> <mrow> <mi>i</mi> <mo>=</mo> <mn>1</mn> <mo>,</mo> <mi>i</mi> <mo>&NotEqual;</mo> <mi>j</mi> </mrow> <mi>n</mi> </munderover> <mi>N</mi> <mi>M</mi> <mi>I</mi> <mrow> <mo>(</mo> <msub> <mi>C</mi> <msub> <mi>k</mi> <mi>i</mi> </msub> </msub> <mo>,</mo> <msub> <mi>C</mi> <msub> <mi>k</mi> <mi>j</mi> </msub> </msub> <mo>)</mo> </mrow> </mrow>

Wherein, K={ 1,2 ... k_nIt is convolution kernel collection,For kth_jThe cluster result that individual convolution kernel obtains, SNMI are kth_jIt is individual Convolution kernel after total NMI values of other convolution kernel size cluster results with being averaging；Difference is obtained by standardizing mutual information NMI The internuclear difference degree of convolution：

<mrow> <mi>N</mi> <mi>M</mi> <mi>I</mi> <mrow> <mo>(</mo> <msub> <mi>C</mi> <mi>a</mi> </msub> <mo>,</mo> <msub> <mi>C</mi> <mi>b</mi> </msub> <mo>)</mo> </mrow> <mo>=</mo> <mfrac> <mrow> <munderover> <mo>&Sigma;</mo> <mrow> <mi>h</mi> <mo>=</mo> <mn>1</mn> </mrow> <msub> <mi>k</mi> <mi>a</mi> </msub> </munderover> <munderover> <mo>&Sigma;</mo> <mrow> <mi>l</mi> <mo>=</mo> <mn>1</mn> </mrow> <msub> <mi>k</mi> <mi>b</mi> </msub> </munderover> <msub> <mi>n</mi> <mrow> <mi>h</mi> <mo>,</mo> <mi>l</mi> </mrow> </msub> <mi>l</mi> <mi>o</mi> <mi>g</mi> <mrow> <mo>(</mo> <mfrac> <mrow> <mi>n</mi> <mo>&CenterDot;</mo> <msub> <mi>n</mi> <mrow> <mi>h</mi> <mo>,</mo> <mi>l</mi> </mrow> </msub> </mrow> <mrow> <msubsup> <mi>n</mi> <mi>h</mi> <mi>a</mi> </msubsup> <mo>&CenterDot;</mo> <msubsup> <mi>n</mi> <mi>l</mi> <mi>b</mi> </msubsup> </mrow> </mfrac> <mo>)</mo> </mrow> </mrow> <msqrt> <mrow> <mo>(</mo> <munderover> <mo>&Sigma;</mo> <mrow> <mi>h</mi> <mo>=</mo> <mn>1</mn> </mrow> <msub> <mi>k</mi> <mi>a</mi> </msub> </munderover> <msubsup> <mi>n</mi> <mi>h</mi> <mi>a</mi> </msubsup> <mi>l</mi> <mi>o</mi> <mi>g</mi> <mfrac> <msubsup> <mi>n</mi> <mi>h</mi> <mi>a</mi> </msubsup> <mi>n</mi> </mfrac> <mo>)</mo> <mo>(</mo> <munderover> <mo>&Sigma;</mo> <mrow> <mi>l</mi> <mo>=</mo> <mn>1</mn> </mrow> <msub> <mi>k</mi> <mi>b</mi> </msub> </munderover> <msubsup> <mi>n</mi> <mi>l</mi> <mi>b</mi> </msubsup> <mi>l</mi> <mi>o</mi> <mi>g</mi> <mfrac> <msubsup> <mi>n</mi> <mi>l</mi> <mi>b</mi> </msubsup> <mi>n</mi> </mfrac> <mo>)</mo> </mrow> </msqrt> </mfrac> </mrow>

Wherein, k_aAnd k_bRespectively different convolution kernel cluster result C_aAnd C_bIn number of clusters, n is whole number of data sets, n_h,lTo be same When be located at C_aH clusters and C_bL clusters in number of data sets,For cluster result C_aH clusters in number of data sets,Tied for cluster Fruit C_bL clusters in number of data sets, NMI (C_a,C_b) value it is bigger, cluster device between otherness it is smaller；

The quality of kernel clustering is assessed using diversity after standardization mutual information NMI is changed：

Div(C_a,C_b)=1-NMI (C_a,C_b)

Div(C_a,C_b) be different convolution kernel cluster results otherness value, the value is smaller, embody cluster device between association get over It is few；

It is as follows with reference to clustering result quality and diversity evaluation standard, its final checkout result：

<mrow> <mi>K</mi> <mi>e</mi> <mi>r</mi> <mo>=</mo> <mi>&alpha;</mi> <munderover> <mo>&Sigma;</mo> <mrow> <mi>a</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>k</mi> </munderover> <mi>S</mi> <mi>N</mi> <mi>M</mi> <mi>I</mi> <mrow> <mo>(</mo> <msub> <mi>C</mi> <mi>a</mi> </msub> <mo>,</mo> <mi>O</mi> <mo>)</mo> </mrow> <mo>+</mo> <mrow> <mo>(</mo> <mn>1</mn> <mo>-</mo> <mi>&alpha;</mi> <mo>)</mo> </mrow> <munderover> <mo>&Sigma;</mo> <mrow> <mi>a</mi> <mo>&NotEqual;</mo> <mi>b</mi> </mrow> <mi>k</mi> </munderover> <mi>D</mi> <mi>i</mi> <mi>v</mi> <mrow> <mo>(</mo> <msub> <mi>C</mi> <mi>a</mi> </msub> <mo>,</mo> <msub> <mi>C</mi> <mi>b</mi> </msub> <mo>)</mo> </mrow> </mrow>

Wherein, Ker represents effective assessed value of the convolution kernel cluster result, and α represents clustering result quality weights, and 1- α are imitated for otherness Fruit weights.
4. a kind of medical question and answer Semantic Clustering method based on integrated convolutional encoding according to claim 3, its feature exist In：Choose the maximum preceding n convolution kernel combination kerSet={ ks of Ker values₁,ks₂,…,ks_nCarry out model training, n value by User judges that it is 3 to take n here, wherein ks according to situation_n={ k_D1,k_D2,…k_Di, k_DiRepresent the convolution kernel in convolution kernel combination Number, feature initial representation corresponding to different convolution kernels is respectively obtained after carrying out model training, is comprised the following steps that：

a)：Pond and activation manipulation

The purpose of pond layer is to reduce the shift invariant of Feature Mapping, after pond layer is placed on into input layer, is obtained original defeated The each Feature Mapping entered is connected to its next convolutional layer, then enters line function by convolution operation and activates, calculating process is as follows：

<mrow> <mo>&ForAll;</mo> <mrow> <mo>(</mo> <mi>i</mi> <mo>,</mo> <mi>j</mi> <mo>)</mo> </mrow> <mo>&Element;</mo> <msub> <mi>n</mi> <mrow> <mi>m</mi> <mo>,</mo> <mi>n</mi> </mrow> </msub> </mrow>

<mrow> <msubsup> <mi>x</mi> <mrow> <mi>i</mi> <mo>,</mo> <mi>j</mi> </mrow> <mover> <mi>l</mi> <mo>^</mo> </mover> </msubsup> <mo>=</mo> <mi>max</mi> <mi> </mi> <mi>p</mi> <mi>o</mi> <mi>o</mi> <mi>l</mi> <mi>i</mi> <mi>n</mi> <mi>g</mi> <mrow> <mo>(</mo> <msubsup> <mi>x</mi> <mrow> <mi>m</mi> <mo>,</mo> <mi>n</mi> </mrow> <mi>l</mi> </msubsup> <mo>)</mo> </mrow> </mrow>

<mrow> <msubsup> <mi>z</mi> <mrow> <mi>i</mi> <mo>,</mo> <mi>j</mi> <mo>,</mo> <msub> <mi>k</mi> <mi>c</mi> </msub> </mrow> <mi>l</mi> </msubsup> <mo>=</mo> <msubsup> <mi>w</mi> <msub> <mi>k</mi> <mi>c</mi> </msub> <mi>l</mi> </msubsup> <msubsup> <mi>x</mi> <mrow> <mi>i</mi> <mo>,</mo> <mi>j</mi> </mrow> <mi>l</mi> </msubsup> <mo>+</mo> <msubsup> <mi>b</mi> <msub> <mi>k</mi> <mi>c</mi> </msub> <mi>l</mi> </msubsup> </mrow>

<mrow> <msubsup> <mover> <mi>z</mi> <mo>^</mo> </mover> <mrow> <mi>i</mi> <mo>,</mo> <mi>j</mi> <mo>,</mo> <msub> <mi>k</mi> <mi>c</mi> </msub> </mrow> <mover> <mi>l</mi> <mo>^</mo> </mover> </msubsup> <mo>=</mo> <msub> <mi>f</mi> <mi>l</mi> </msub> <mrow> <mo>(</mo> <msubsup> <mi>z</mi> <mrow> <mi>i</mi> <mo>,</mo> <mi>j</mi> <mo>,</mo> <msub> <mi>k</mi> <mi>c</mi> </msub> </mrow> <mi>l</mi> </msubsup> <mo>)</mo> </mrow> </mrow>

Wherein, (m, n) is the neighboring units of (i, j),WithIt is l layers and l+1 layers respectively in (m, n) and (i, j) position On neuron, l with the expression that next layer is updated after having operated every time, andIt is expressed as the next of l Layer, so facilitate the expression after each level change,WithCorresponding weights and bias vector respectively on l layers, lead to Activation function call is crossed to arriveValue；

b)：Dropout is operated

Dropout operations are to prevent over-fitting, are to interrupt connection between layers at random in the training process, are prevented The only common adaptation situation of neutral net, its specific expression are as follows：

<mrow> <msubsup> <mi>r</mi> <mi>a</mi> <mi>l</mi> </msubsup> <mo>=</mo> <mi>B</mi> <mi>e</mi> <mi>r</mi> <mi>n</mi> <mi>o</mi> <mi>u</mi> <mi>l</mi> <mi>l</mi> <mi>i</mi> <mrow> <mo>(</mo> <mi>p</mi> <mo>)</mo> </mrow> </mrow>

<mrow> <msubsup> <mi>z</mi> <mrow> <mi>i</mi> <mo>,</mo> <mi>j</mi> <mo>,</mo> <mi>k</mi> </mrow> <mover> <mi>l</mi> <mo>^</mo> </mover> </msubsup> <mo>=</mo> <msubsup> <mi>r</mi> <mi>a</mi> <mi>l</mi> </msubsup> <mo>*</mo> <msubsup> <mi>z</mi> <mrow> <mi>i</mi> <mo>,</mo> <mi>j</mi> <mo>,</mo> <msub> <mi>k</mi> <mi>c</mi> </msub> </mrow> <mi>l</mi> </msubsup> </mrow>

Wherein Bernoulli functions, it is in order to Probability p, the random vector for generating one 0,1Pass through vectorAnd last layer NeuronCarry out random interrupt and obtain next layer of neuron value

The step of the above two obtains the character representation result of different convolution kernels, and then they are merged, and space is carried out efficient Utilize.
5. a kind of medical question and answer Semantic Clustering method based on integrated convolutional encoding according to claim 4, its feature exist In the character representation result for merging different convolution kernels shares the fusion method of four kinds of different modes：

a)：Onrelevant merges

Represent to need the convolution character representation collection merged, onrelevant, which merges, represents different convolution Internuclear character representation does not have any dependence, is all separate, is expressed as：

<mrow> <msup> <mi>R</mi> <mrow> <mi>L</mi> <mo>+</mo> <mn>1</mn> </mrow> </msup> <mo>=</mo> <munderover> <mrow> <mi>I</mi> <mi>C</mi> </mrow> <mrow> <mi>j</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>n</mi> </munderover> <mrow> <mo>(</mo> <msubsup> <mi>R</mi> <mi>j</mi> <mi>L</mi> </msubsup> <mo>)</mo> </mrow> </mrow>

Wherein require that the neuron number that each convolution kernel obtains will be equal, i.e.,Because to these It is characterized in the process merged, the width when width represented after fusion is also equal to input is finally obtained, in melting for this model During conjunction, merged using the method for summation, the character pair that different convolution kernels obtain represents to be added, specific public Formula represents as follows：

<mrow> <msup> <mi>R</mi> <mrow> <mi>L</mi> <mo>+</mo> <mn>1</mn> </mrow> </msup> <mo>=</mo> <munderover> <mo>&Sigma;</mo> <mrow> <mi>j</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>n</mi> </munderover> <mrow> <mo>(</mo> <msubsup> <mi>R</mi> <mi>j</mi> <mi>L</mi> </msubsup> <mo>)</mo> </mrow> </mrow>

b)：Onrelevant is serial

Onrelevant is serially that different convolution kernel character representations are carried out splicing to be together in series, wherein dependence, table is also not present Show as follows：

<mrow> <msup> <mi>R</mi> <mrow> <mi>L</mi> <mo>+</mo> <mn>1</mn> </mrow> </msup> <mo>=</mo> <munderover> <mrow> <mi>I</mi> <mi>S</mi> </mrow> <mrow> <mi>j</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>n</mi> </munderover> <mrow> <mo>(</mo> <msubsup> <mi>R</mi> <mi>j</mi> <mi>L</mi> </msubsup> <mo>)</mo> </mrow> </mrow>

The fusion feature that the operation finally gives represents that width is equal to its all convolution character representation width size sum, this method Expand the dimension for changing layer；

c)：Association merges

Following two methods are with above two kinds primary difference is that existing characteristics rely on, and the feature behind convolution kernel is upper one with it The expression of feature is relevant, for different tasks, different mapping relations be present, represents as follows：

<mrow> <msup> <mi>R</mi> <mrow> <mi>L</mi> <mo>+</mo> <mn>1</mn> </mrow> </msup> <mo>=</mo> <munderover> <mrow> <mi>A</mi> <mi>C</mi> </mrow> <mrow> <mi>j</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>n</mi> </munderover> <mrow> <mo>(</mo> <msubsup> <mi>R</mi> <mi>j</mi> <mi>L</mi> </msubsup> <mo>)</mo> </mrow> </mrow>

This method is all superposition, width size is with the character representation before fusion merging in terms of combinations of features with onrelevant Size is the same；

d)：Association is serial

The association, which is serially characterized, to be relied on and feature splices and combines, and specifically represents as follows：

<mrow> <msup> <mi>R</mi> <mrow> <mi>L</mi> <mo>+</mo> <mn>1</mn> </mrow> </msup> <mo>=</mo> <munderover> <mrow> <mi>A</mi> <mi>S</mi> </mrow> <mrow> <mi>j</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>n</mi> </munderover> <mrow> <mo>(</mo> <msubsup> <mi>R</mi> <mi>j</mi> <mi>L</mi> </msubsup> <mo>)</mo> </mrow> </mrow>

Four kinds of methods can voluntarily select according to the model of design, to different task and corpus, different fusion sides Method has different effects.
6. a kind of medical question and answer Semantic Clustering method based on integrated convolutional encoding according to claim 5, its feature exist In the character representation result after fusion is inputted into self-editing ink recorder described in step 5, input reconstruct training is carried out and obtains optimal spy It is as follows to levy the idiographic flow represented：

a)：Different convolution character representations after fusion are performed the encoding operation

Feature self study is carried out using traditional self-editing ink recorder, to carrying out coding behaviour to the different convolution character representations after fusion Make：

<mrow> <msubsup> <mi>z</mi> <mrow> <mi>i</mi> <mo>,</mo> <mi>j</mi> <mo>,</mo> <mi>k</mi> </mrow> <mover> <mi>l</mi> <mo>^</mo> </mover> </msubsup> <mo>=</mo> <munderover> <mrow> <mi>M</mi> <mi>e</mi> <mi>r</mi> <mi>g</mi> <mi>e</mi> </mrow> <mrow> <mi>c</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>N</mi> </munderover> <mrow> <mo>(</mo> <msubsup> <mi>z</mi> <mrow> <mi>i</mi> <mo>,</mo> <mi>j</mi> <mo>,</mo> <msub> <mi>k</mi> <mi>c</mi> </msub> </mrow> <mi>l</mi> </msubsup> <mo>)</mo> </mrow> </mrow>

<mrow> <msubsup> <mi>y</mi> <mrow> <mi>i</mi> <mo>,</mo> <mi>j</mi> <mo>,</mo> <mi>k</mi> </mrow> <mover> <mi>l</mi> <mo>^</mo> </mover> </msubsup> <mo>=</mo> <msubsup> <mi>f</mi> <mrow> <mi>e</mi> <mi>n</mi> <mi>c</mi> <mi>o</mi> <mi>d</mi> <mi>e</mi> <mi>d</mi> </mrow> <mi>l</mi> </msubsup> <mrow> <mo>(</mo> <msubsup> <mi>z</mi> <mrow> <mi>i</mi> <mo>,</mo> <mi>j</mi> <mo>,</mo> <mi>k</mi> </mrow> <mi>l</mi> </msubsup> <mo>)</mo> </mrow> </mrow>

Wherein,For kth_cValue of the individual convolution kernel on l layers (i, j) unit, by different convolution kernels on different neurons The feature learnt merges, and obtains next layer of character representationThese character representations are inputted in encoded models, Encoded, obtained

b)：Up-sampling and decoding operate

After being operated by pondization, the width of its input matrix can be changed into original half, and the study of self-editing ink recorder needs to input With output dimension it is equal come counting loss, therefore need to the result after coding carry out up-sampling recover before input sample it is big It is small, then the result after up-sampling is decoded：

<mrow> <msubsup> <mi>y</mi> <mrow> <mi>r</mi> <mo>,</mo> <mi>s</mi> <mo>,</mo> <mi>k</mi> </mrow> <mover> <mi>l</mi> <mo>^</mo> </mover> </msubsup> <mo>=</mo> <mi>u</mi> <mi>p</mi> <mi>s</mi> <mi>a</mi> <mi>m</mi> <mi>p</mi> <mi>l</mi> <mi>i</mi> <mi>n</mi> <mi>g</mi> <mrow> <mo>(</mo> <msubsup> <mi>y</mi> <mrow> <mi>i</mi> <mo>,</mo> <mi>j</mi> <mo>,</mo> <mi>k</mi> </mrow> <mi>l</mi> </msubsup> <mo>)</mo> </mrow> </mrow>

<mrow> <mo>&ForAll;</mo> <mrow> <mo>(</mo> <mi>i</mi> <mo>,</mo> <mi>j</mi> <mo>)</mo> </mrow> <mo>&Element;</mo> <msub> <mi>n</mi> <mrow> <mi>r</mi> <mo>,</mo> <mi>s</mi> </mrow> </msub> </mrow>

<mrow> <msubsup> <mi>y</mi> <mrow> <mi>r</mi> <mo>,</mo> <mi>s</mi> <mo>,</mo> <mi>k</mi> </mrow> <mover> <mi>l</mi> <mo>^</mo> </mover> </msubsup> <mo>=</mo> <msubsup> <mi>f</mi> <mrow> <mi>d</mi> <mi>e</mi> <mi>c</mi> <mi>o</mi> <mi>d</mi> <mi>e</mi> <mi>d</mi> </mrow> <mi>l</mi> </msubsup> <mrow> <mo>(</mo> <msubsup> <mi>y</mi> <mrow> <mi>r</mi> <mo>,</mo> <mi>s</mi> <mo>,</mo> <mi>k</mi> </mrow> <mi>l</mi> </msubsup> <mo>)</mo> </mrow> </mrow>

Wherein, up-sampling operation is carried out to the feature after coding to obtain(i, j) is neuron elements and belonged to (r, s), right Feature after sampling carries out decoding operate, finally gives output

c)：Counting loss function

The loss function of whole learning network is L (X, Y), and the calculating for self-editing ink recorder loss function has following two modes：

<mrow> <mi>L</mi> <mrow> <mo>(</mo> <mi>x</mi> <mo>,</mo> <mi>y</mi> <mo>)</mo> </mrow> <mo>=</mo> <msub> <mi>L</mi> <mn>2</mn> </msub> <mrow> <mo>(</mo> <mi>x</mi> <mo>,</mo> <mi>y</mi> <mo>)</mo> </mrow> <mo>=</mo> <mfrac> <mn>1</mn> <mrow> <mn>2</mn> <mi>N</mi> </mrow> </mfrac> <mo>|</mo> <mo>|</mo> <mi>x</mi> <mo>-</mo> <mi>y</mi> <mo>|</mo> <msup> <mo>|</mo> <mn>2</mn> </msup> </mrow>

<mrow> <mi>L</mi> <mrow> <mo>(</mo> <mi>x</mi> <mo>,</mo> <mi>y</mi> <mo>)</mo> </mrow> <mo>=</mo> <mo>-</mo> <mfrac> <mn>1</mn> <mi>N</mi> </mfrac> <munderover> <mo>&Sigma;</mo> <mrow> <mi>n</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>N</mi> </munderover> <mo>&lsqb;</mo> <msub> <mi>x</mi> <mi>n</mi> </msub> <msub> <mi>logy</mi> <mi>n</mi> </msub> <mo>+</mo> <mrow> <mo>(</mo> <mn>1</mn> <mo>-</mo> <msub> <mi>x</mi> <mi>n</mi> </msub> <mo>)</mo> </mrow> <mi>log</mi> <mrow> <mo>(</mo> <mn>1</mn> <mo>-</mo> <msub> <mi>y</mi> <mi>n</mi> </msub> <mo>)</mo> </mrow> <mo>&rsqb;</mo> </mrow>

Wherein, x is input vector, and y is the last output vector of model, x_nAnd y_nRespectively wherein n-th value, L₂(x, y) is L₂ Normal form, after obtaining initial loss function, the weights of whole model and biasing are constantly adjusted by stochastic gradient descent method, So that model learning to feature reach best；

d)：Model is evaluated

Experimental result evaluation index has normalized mutual information NMI, ARI and Average Accuracy AR, represents as follows respectively：

<mrow> <mi>R</mi> <mi>I</mi> <mo>=</mo> <mfrac> <mrow> <mi>a</mi> <mo>+</mo> <mi>b</mi> </mrow> <msub> <mi>C</mi> <mi>p</mi> </msub> </mfrac> <mo>=</mo> <munder> <mo>&Sigma;</mo> <mrow> <mi>i</mi> <mi>j</mi> </mrow> </munder> <mfenced open = "(" close = ")"> <mtable> <mtr> <mtd> <msub> <mi>n</mi> <mrow> <mi>i</mi> <mi>j</mi> </mrow> </msub> </mtd> </mtr> <mtr> <mtd> <mn>2</mn> </mtd> </mtr> </mtable> </mfenced> </mrow>

<mrow> <mi>E</mi> <mo>&lsqb;</mo> <mi>R</mi> <mi>I</mi> <mo>&rsqb;</mo> <mo>=</mo> <mo>&lsqb;</mo> <munder> <mo>&Sigma;</mo> <mi>i</mi> </munder> <mfenced open = "(" close = ")"> <mtable> <mtr> <mtd> <msub> <mi>a</mi> <mi>i</mi> </msub> </mtd> </mtr> <mtr> <mtd> <mn>2</mn> </mtd> </mtr> </mtable> </mfenced> <munder> <mo>&Sigma;</mo> <mi>j</mi> </munder> <mfenced open = "(" close = ")"> <mtable> <mtr> <mtd> <msub> <mi>b</mi> <mi>j</mi> </msub> </mtd> </mtr> <mtr> <mtd> <mn>2</mn> </mtd> </mtr> </mtable> </mfenced> <mo>&rsqb;</mo> <mo>/</mo> <msub> <mi>C</mi> <mi>p</mi> </msub> </mrow>

<mrow> <mi>m</mi> <mi>a</mi> <mi>x</mi> <mrow> <mo>(</mo> <mi>R</mi> <mi>I</mi> <mo>)</mo> </mrow> <mo>=</mo> <mfrac> <mn>1</mn> <mn>2</mn> </mfrac> <mo>&lsqb;</mo> <munder> <mo>&Sigma;</mo> <mi>i</mi> </munder> <mfenced open = "(" close = ")"> <mtable> <mtr> <mtd> <msub> <mi>a</mi> <mi>i</mi> </msub> </mtd> </mtr> <mtr> <mtd> <mn>2</mn> </mtd> </mtr> </mtable> </mfenced> <mo>+</mo> <munder> <mo>&Sigma;</mo> <mi>j</mi> </munder> <mfenced open = "(" close = ")"> <mtable> <mtr> <mtd> <msub> <mi>b</mi> <mi>j</mi> </msub> </mtd> </mtr> <mtr> <mtd> <mn>2</mn> </mtd> </mtr> </mtable> </mfenced> <mo>&rsqb;</mo> </mrow>

<mrow> <mi>A</mi> <mi>R</mi> <mi>I</mi> <mo>=</mo> <mfrac> <mrow> <mi>R</mi> <mi>I</mi> <mo>-</mo> <mi>E</mi> <mo>&lsqb;</mo> <mi>R</mi> <mi>I</mi> <mo>&rsqb;</mo> </mrow> <mrow> <mi>m</mi> <mi>a</mi> <mi>x</mi> <mrow> <mo>(</mo> <mi>R</mi> <mi>I</mi> <mo>)</mo> </mrow> <mo>-</mo> <mi>E</mi> <mo>&lsqb;</mo> <mi>R</mi> <mi>I</mi> <mo>&rsqb;</mo> </mrow> </mfrac> <mo>=</mo> <mfrac> <mrow> <munder> <mo>&Sigma;</mo> <mrow> <mi>i</mi> <mi>j</mi> </mrow> </munder> <mfenced open = "(" close = ")"> <mtable> <mtr> <mtd> <msub> <mi>n</mi> <mrow> <mi>i</mi> <mi>j</mi> </mrow> </msub> </mtd> </mtr> <mtr> <mtd> <mn>2</mn> </mtd> </mtr> </mtable> </mfenced> <mo>-</mo> <mo>&lsqb;</mo> <munder> <mo>&Sigma;</mo> <mi>i</mi> </munder> <mfenced open = "(" close = ")"> <mtable> <mtr> <mtd> <msub> <mi>a</mi> <mi>i</mi> </msub> </mtd> </mtr> <mtr> <mtd> <mn>2</mn> </mtd> </mtr> </mtable> </mfenced> <munder> <mo>&Sigma;</mo> <mi>j</mi> </munder> <mfenced open = "(" close = ")"> <mtable> <mtr> <mtd> <msub> <mi>b</mi> <mi>j</mi> </msub> </mtd> </mtr> <mtr> <mtd> <mn>2</mn> </mtd> </mtr> </mtable> </mfenced> <mo>&rsqb;</mo> <mo>/</mo> <msub> <mi>C</mi> <mi>p</mi> </msub> </mrow> <mrow> <mfrac> <mn>1</mn> <mn>2</mn> </mfrac> <mo>&lsqb;</mo> <munder> <mo>&Sigma;</mo> <mi>i</mi> </munder> <mfenced open = "(" close = ")"> <mtable> <mtr> <mtd> <msub> <mi>a</mi> <mi>i</mi> </msub> </mtd> </mtr> <mtr> <mtd> <mn>2</mn> </mtd> </mtr> </mtable> </mfenced> <mo>+</mo> <munder> <mo>&Sigma;</mo> <mi>j</mi> </munder> <mfenced open = "(" close = ")"> <mtable> <mtr> <mtd> <msub> <mi>b</mi> <mi>j</mi> </msub> </mtd> </mtr> <mtr> <mtd> <mn>2</mn> </mtd> </mtr> </mtable> </mfenced> <mo>&rsqb;</mo> <mo>-</mo> <mo>&lsqb;</mo> <munder> <mo>&Sigma;</mo> <mi>i</mi> </munder> <mfenced open = "(" close = ")"> <mtable> <mtr> <mtd> <msub> <mi>a</mi> <mi>i</mi> </msub> </mtd> </mtr> <mtr> <mtd> <mn>2</mn> </mtd> </mtr> </mtable> </mfenced> <mo>+</mo> <munder> <mo>&Sigma;</mo> <mi>j</mi> </munder> <mfenced open = "(" close = ")"> <mtable> <mtr> <mtd> <msub> <mi>b</mi> <mi>j</mi> </msub> </mtd> </mtr> <mtr> <mtd> <mn>2</mn> </mtd> </mtr> </mtable> </mfenced> <mo>&rsqb;</mo> <mo>/</mo> <msub> <mi>C</mi> <mi>p</mi> </msub> </mrow> </mfrac> </mrow>

<mrow> <mi>A</mi> <mi>P</mi> <mo>=</mo> <mfrac> <mrow> <munderover> <mo>&Sigma;</mo> <mrow> <mi>i</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>n</mi> </munderover> <msub> <mi>P</mi> <mi>i</mi> </msub> </mrow> <mi>n</mi> </mfrac> </mrow>

Wherein, a is the number of elements that same class is assigned in different clustering methods, and b is that distinct methods are assigned to inhomogeneity The quantity of middle element, C_pFor the summation of all data elements, n_ij、a_i、b_jRespective value respectively in incidence matrix, RI are Rand Index indexs, E [RI] are the desired value of RI indexs, and max (RI) is the maximum in RI values, and ARI is Adjusted Rand Index indexs, by above several indexs come COMPREHENSIVE CALCULATING, AP is Average Accuracy, by the accuracy rate P of each data set_iWith Averaging obtains.