CN103544392B

CN103544392B - Medical science Gas Distinguishing Method based on degree of depth study

Info

Publication number: CN103544392B
Application number: CN201310503402.9A
Authority: CN
Inventors: 刘启和; 陈雷霆; 蔡洪斌; 邱航; 蒲晓蓉; 胡晓楠
Original assignee: University of Electronic Science and Technology of China
Current assignee: University of Electronic Science and Technology of China
Priority date: 2013-10-23
Filing date: 2013-10-23
Publication date: 2016-08-24
Anticipated expiration: 2033-10-23
Also published as: CN103544392A

Abstract

The invention discloses a kind of medical science Gas Distinguishing Method based on degree of depth study, specifically used original frequency response signal, it is carried out simple normalization, then input stack autoencoder network, by successively extracting, final study obtains the abstract characteristics of initial data, whole network externally shields the processes such as extraction feature, dimensionality reduction, suppression drift, finally add a classification layer at network so that these features can be directly entered grader and classify simultaneously.Training process is divided into two steps of pre-training and fine setting, can be effectively improved the learning capacity of network, and after having trained, new samples input network can directly obtain the classification of prediction.The method of the present invention can automatically extract the effective distinguishing characteristic of medical science gas, the steps such as feature extraction, feature selection and suppression drift is merged into one, greatly simplifies the complexity of traditional method, improve the efficiency of gas detecting and identification.

Description

Medical science Gas Distinguishing Method based on degree of depth study

Technical field

The invention belongs to field of biomedicine technology, be specifically related to medical science Gas Distinguishing Method.

Background technology

Machine olfaction is a kind of artificial intelligence system, and its ultimate principle is: scent molecule is adsorbed by sensor array, produces The signal of telecommunication, then uses various signal processing technology to extract feature, then judges through computer PRS, complete gas The work such as body identification and measurement of concetration.Electric nasus system is i.e. typical case's application of machine olfaction, plays at medical domain Very important effect, such as, diagnose some disease, identifies the bacterial species in blood, detects the gas harmful to respiratory system Deng.

Sensing gas detecting has important application with identification at medical domain, such as, can utilize Electronic Nose equipment collection Sample data in oral cavity, thoracic cavity, blood, then use various signal processing technology with analyzing and processing, then by computer patterns Identification system judges, and can complete the tasks such as such as medical diagnosis on disease, pathogenic bacteria identification, determination of drug concentration.

Traditional gas detecting and recognition methods generally comprise the steps such as feature extraction, feature selection, finally by classification, The means such as recurrence or cluster reach predeterminated target.For needing the equipment of life-time service, it is necessary to use effective sensor Drift-compensation techniques suppresses the impact brought of drifting about.In medical application, owing to these operated in accordance with conventional methods are complicated, efficiency is relatively Low, generally require between accuracy and real-time compromise.

The data that sensor sample obtains can be regarded as a kind of time series signal, and this signal structure is complicated, it is difficult to Explain, and often dimension is the highest.In order to preferably be identified, it usually needs according to the various attribute design features of signal, Again through feature selection, such as dimensionality reduction etc., then they are classified as the input of sorting algorithm, such as support vector machine.

As time goes on sensor drift refers to, the response of sensor can occur slow and random change.This Planting change causes PRS can not be advantageously applied to follow-up sample to be tested in the pattern currently learning to obtain, from And make gas detecting be gradually lowered with the accuracy rate of identification.In medical application, in order to suppress the impact of sensor drift, typically There are two kinds of measures: (1) develop effective drift-compensation techniques.This process is often separate with feature extraction, and operates multiple Miscellaneous, efficiency is the lowest；(2) less due to the drift degree in the short time, update to protect by periodically carrying out Electronic Nose equipment safeguarding Card sampled data is reliable and stable, but this substantially increases cost undoubtedly, decreases the service life of equipment.

It practice, some designs good feature good robustness to drift.Consider from this angle, Ke Yijian Single suppress sensor drift by extracting more preferable feature, thus by two Process fusions together.Degree of depth learning art Comprised the artificial neural network of multiple hidden layer by foundation, simulating human brain is analyzed study and explains data, permissible Obtain the expression of the high abstraction of data, be good at finding pattern potential in data, the most applicable when solving the problems referred to above.

At document " M.Trincavelli, S.Coradeschi, A.Loutfi, B.P.Thunberg, Direct identification ofbacteria in blood culture samples using an electronic Nose, IEEE Trans Biomedical Engineering57 (12), 2,884 2890,2010 " propose in a kind of effectively The method of the pathogen in identification blood cultivation specimen, first the method is obtained sample data, then by the sampling of Electronic Nose equipment Carry out feature extraction and dimensionality reduction, finally complete classification, wherein in characteristic extraction part, for the total bulk wave of signal by support vector machine Shape, have employed steady-state response and response two kinds of feature extracting methods of derivative.

Sometimes for obtaining higher recognition accuracy in complicated problem, need to analyze signal waveform more meticulously, Extract the higher feature of dimension.Document " A.Vergaraa, S.Vembua, T.Ayhanb, M.A.R.Vitae, M.L.H.Vitae and R.Huertaa.Chemical gas sensor drift compensation using classifer ensembles,Sensors andActuators B:Chemical,vol.166-167,pp.320-329, May 2012 " have studied the recognition accuracy of the gas such as ethanol how improved under drift, devise 8 kinds of different features.

In the case of sorting algorithm determines, the recognition accuracy of gas is just solely dependent upon the fine or not degree of feature.Compare Frequency response values in primary signal, designs good feature and dimension redundancy can be greatly reduced, highlighted not simultaneously Difference between generic, is commonly available reasonable recognition accuracy.

But the feature of hand-designed is often for some specific application scenario (gas type, sensor type, the external world Environment etc.), thus there is extremely strong purposiveness, versatility is very poor.And due to the cross-sensitivity of sensor, final extraction Intrinsic dimensionality is the highest, it usually needs find efficient dimension-reduction algorithm, such as PCA, LDA etc..If in the application that certain is new, The recognition accuracy using existing feature does not all reach requirement, it is necessary to design more preferable feature, and this further increases undoubtedly The complexity of this task.

The most efficient method of suppression drift at present is to carry out drift compensation by cycle re-graduation, and its general idea is searching one Plant linear transformation sensor response is normalized so that grader may be directly applied to the data after these conversion.

CN1514239A disclose a kind of realize gas sensor drift detection and revise method.The method is by comprehensive Utilize pivot analysis and wavelet transformation technique, improve sensitivity and the accuracy of sensor drift detection.To the drift detected Displacement sensor, uses modification method based on self adaptation drift model, and sensor output is carried out on-line amending, and drift about mould simultaneously Type can carry out online updating, thus reaches to improve sensing system reliability, extends the purpose in system service life.

At document " T.Artursson, T.I.P.M.and M.Holmberg,“Drift correction for gas sensors using multivariatemethods,” J.Chemom., vol.14, no.5 6, pp.711 723,2000 " in one reference gas carry out approximate evaluation drift bearing, so Response to gas to be analyzed carries out following correction afterwards.

But the moving tracks that these methods assume that sensor is linear, this point is not confirmed, and Generally require the ginseng that a kind of chemical property is the most stable and the most similar on sensor row is to gas to be analyzed Examining gas, this condition is undoubtedly the harshest in actual applications.In addition, these methods operate in actual applications Considerably complicated, efficiency is the lowest.

Summary of the invention

It is an object of the invention to the complexity simplifying traditional gas detection with recognition methods, exploitation one is simpler, higher Effect, gas detecting more robust for sensor drift and recognition methods.

The present invention program uses original frequency response signal, and it is carried out simple normalization, and then input stack is certainly Coding network, by successively extracting, final study obtains the abstract characteristics of initial data, and whole network externally shields extraction spy Levy, dimensionality reduction, suppress the processes such as drift, finally add a classification layer at network so that these features can be directly entered simultaneously Grader is classified.Training process is divided into two steps of pre-training and fine setting, can be effectively improved the learning capacity of network, After having trained, new samples input network can directly obtain the classification of prediction.

The technical scheme is that a kind of medical science Gas Distinguishing Method based on degree of depth study, specifically comprise the following steps that

Step 1. data normalization, is provided with m sample, by each sample by following form tissue, v=[s₁,s₂···, s_t], wherein, s_iBeing i-th frequency response values, one has t response value, whole gas data collection and corresponding label can be with table It is shown as:

V = [v_{1}^{T}, v_{2}^{T}, ..., v_{i}^{T}, ..., v_{m}^{T}]

Y=[y₁,y₂,…,y_i,…,y_m]^T

T represents the transposition of vector, and in matrix V, the i-th row vi represents i-th sample, and in Y, i-th element is corresponding sample Class label；

Utilize formulaData set is normalized to [0,1],

Wherein, V_i,jRepresenting the i-th frequency response values of jth sample, L is normalized lower bound, and its value is 0, and U is for returning One upper bound changed, its value is 1, max_iAnd min_iFor the maximum and minimum value of a line every in matrix, the data set a after normalization⁽⁰⁾Represent；

Step 2. pre-training stack autoencoder network, in described stack autoencoder network, v, h and y represent input respectively Layer, hidden layer and output layer, W⁽ⁱ⁾For connecting the weight matrix of each layer, b⁽ⁱ⁾Bias vector for hidden layer；

Step 2.1. training ground floor, i.e. first automatic coding machine, object function is:

J = \frac{1}{2 m} \underset{i}{Σ} (v_{i} - {\hat{v}}_{i}) + \frac{λ}{2} \underset{i}{Σ} \underset{j}{Σ} {W_{i j}}^{2} + β \underset{j}{Σ} [ρ \log \frac{ρ}{p_{j}} + (1 - ρ) \log \frac{1 - ρ}{1 - p_{j}}];

Wherein, Section 1 is reconstructed error item, represents input and the difference degree of output, wherein, v_iRepresent step 1 normalizing I-th input sample after change,Represent sample v_iBy output at output layer after network；Section 2 is referred to as weight attenuation term, It is the amplitude in order to reduce weight, prevents over-fitting, wherein, W_ijRepresent current layer jth unit and next layer of i-th unit it Between weights, Section 3 is sparse penalty term, and wherein, pj represents the average activation of Hidden unit j, λ, β and ρ and is and presets Parameter；M is number of samples, and J represents the object function of first automatic coding machine；

Optimization object function, for the automatic coding machine of a n-layer, concrete optimization step is:

Step 2.1.1. random initializtion parameter W⁽ⁱ⁾、b⁽ⁱ⁾, initialize matrix or the vector of complete zero, i.e. Δ W⁽ⁱ⁾=0, Δ b⁽ⁱ⁾=0；

Step 2.1.2. makesFor each sample, back-propagation algorithm is utilized to calculate partially DerivativeDetailed process is as follows:

Feedforward calculates, and obtains each layer excitation a⁽ⁱ⁾, computing formula is a⁽ⁱ⁾=σ (W⁽ⁱ⁾a^(i-1)+b⁽ⁱ⁾), wherein,Being sigmoid function, output area is [0,1]；

For output layer, calculate residual error: δ⁽ⁿ⁾=-(v-a⁽ⁿ⁾)·σ′(z⁽ⁿ⁾), wherein, " " represents dot product, wherein z⁽ⁿ⁾=W^(n-1)a^(n-1)+b^(n-1), σ ' represents the derivative of σ (x)；

For l=n-1, n-2 ..., each layer of 2, calculate:

Calculating local derviation numerical value:

WhereinWith(W, b) to W all to represent J⁽ⁱ⁾Local derviation,WithAll represent (W, b) to b for J⁽ⁱ⁾Local derviation；

The partial derivative obtained is added to Δ W by step 2.1.3. respectively⁽ⁱ⁾,Δb⁽ⁱ⁾On, i.e.

Step 2.1.4. undated parameter W⁽ⁱ⁾,b⁽ⁱ⁾, Wherein, α is learning rate；

Step 2.1.5. repetition step 2.1.2, to step 2.1.4, is gradually reduced the value of object function, until the threshold set Value, obtain encoding layer parameter (W, b) and the parameter of decoding layer

Step 2.2. abandons decoding layer after having trainedBy the parameter of coding layer, (W, b) as stack own coding net The initial parameter of corresponding level, i.e. W in network⁽¹⁾=W, b⁽¹⁾=b；

Step 2.3. calculates the hidden layer excitation of current automatic coding machine: a⁽¹⁾=σ (W⁽¹⁾a⁽⁰⁾+b⁽¹⁾)；

Step 2.4. is at excitation a⁽¹⁾The upper training second layer, i.e. second automatic coding machine, wherein, first automatic encoding The hidden layer of machine is as the input layer of second automatic coding machine, and training process is identical with the process of training ground floor, but input Become a⁽¹⁾.Train initial parameter W obtaining the network second layer⁽²⁾, b⁽²⁾And hidden layer excitation a⁽²⁾；

Step 2.5, for third layer to n-th layer, repeats the process of step 2.1 to 2.4, can obtain each hidden layer Initial parameter, and finally give the excitation a of the n-th hidden layer⁽ⁿ⁾, the excitation of this layer also serves as the defeated of softmax layer simultaneously Enter, be denoted as a_S；

The a that step 2.6. step 2.5 obtains_SWith last layer of label Y training network, i.e. softmax grader, obtain Initial parameter W to last layer_S；

A is represented respectively with x and θ_SAnd W_S, and assume total k kind, for i-th sample, it was predicted that the class mark obtained Probability for jth class is:

P (y_{i} = j | x_{i}; θ) = \frac{{expθ}_{j}^{T} x_{i}}{Σ_{l = 1}^{k} {expθ}_{l}^{T} x_{i}}

Wherein,Represent the jth row in θ, represent and connect the row of weights between jth output unit and all input blocks Vector.L is constant variables and 1≤l≤k, and k is the input a of softmax layer_SAnd initial parameter W of softmax grader_S's Classification number, x_iInput value for the softmax layer of i-th sample.Final output is a probability column vector P, jth component Represent that forecast sample is judged as the probability of jth classification, utilize and minimize loss function to train weight matrix θ:

J (θ) = - \frac{1}{m} [Σ_{i = 1}^{m} Σ_{j = 1}^{k} 1 {y_{i} = j} \log P (y_{i} = j)] + \frac{λ}{2} Σ_{i = 1}^{m} Σ_{j = 1}^{n} {θ_{i j}}^{2}

Wherein, logP (y_i=j) represent a certain probit P (y_i=j) natural logrithm,It is indicator function, works as bracket In condition be true time value be 1, no person is 0；M is number of samples, and n is the number of plies of automatic coding machine；

Step 3 is finely tuned, and i.e. network is regarded as an entirety, calculates the partial derivative of each layer parameter with back propagation, then uses Gradient descent method iteration optimization, detailed process is as follows:

Step 3.1. uses formula a⁽ⁱ⁾=σ (W⁽ⁱ⁾a^(i-1)+b⁽ⁱ⁾) carry out feedforward calculating, obtain the excitation a of each layer⁽ⁱ⁾；

Step 3.2. calculates softmax layer parameter W_SPartial derivative:Wherein, P is step 2.6 Calculated conditional probability vector；

Step 3.3. calculates the residual error of last hidden layer and is:Wherein (W, b) to a to represent J⁽ⁿ⁾Local derviation, a⁽ⁿ⁾It it is the excitation of the n-th hidden layer；

Step 3.4. for l=n-1, n-2 ..., each layer of 2, calculate:

The partial derivative of the step 3.5. each hidden layer of calculating:

Step 3.6. utilizes the partial derivative that above-mentioned steps obtains to update each layer parameter:

\begin{matrix} {W_{S}}^{'} = W_{S} - α (\frac{1}{m} Σ {&dtri;}_{W_{S}} J + λ θ) \\ W^{{(i)}^{'}} = W^{(i)} - α (\frac{1}{m} Σ {&dtri;}_{W^{(i)}} J) \\ b^{{(i)}^{'}} = b^{(i)} - α (\frac{1}{m} Σ {&dtri;}_{b^{(i)}} J) \end{matrix};

Wherein(W, b) to W to represent J_SLocal derviation, W_SFor the initial parameter of softmax grader,Represent J (W, b) to W⁽ⁱ⁾Local derviation,(W, b) to b to represent J⁽ⁱ⁾Local derviation；

Step 3.7. repeat the above steps, reduces the value of object function by iteration, until the threshold value set；

Step 4. predicts the affiliated classification of forecast sample, and detailed process is as follows:

Step 4.1. is by forecast sample v_pNormalize to [0,1]；

Step 4.2., for hidden layer, uses formula a⁽ⁱ⁾=σ (W⁽ⁱ⁾a^(i-1)+b⁽ⁱ⁾) successively carry out feedforward calculating, and then obtain The input a of softmax layer_S；

Step 4.3. is according to probability calculation formula design conditions probability vector P of step 2.5, wherein, and maximum component pair The classification answered is prediction classification belonging to this sample.

Above-mentioned various in i and j all represent for counting constant parameter.

The method have the benefit that the present invention devises and adapt to the network structure that medical science gas signal processes, by the sample of input The most successively extract feature, eventually enter into classification layer characteristic dimension relatively low and also for drift have good robustness.With Traditional feature extracting method is compared, and this method can automatically extract the effective distinguishing characteristic of medical science gas, by feature extraction, feature Select and suppress the steps such as drift to merge into one, greatly simplifiing the complexity of traditional method, improve gas detecting With the efficiency identified.It is in particular in following several respects:

(1), except training softmax grader in step 2, other processes are not required to class label, therefore extract feature Process be unsupervised；If sample is rare, it is possible to use substantial amounts of without owning before mark sample training classification layer Layer, is finally finely adjusted with a small amount of mark sample；

(2), from network structure it will be seen that the number of unit of each layer is all few than preceding layer, therefore eventually enter into grader Input dimension relatively low, much smaller than being originally inputted, can be regarded as a reduction process；

(3) extract feature to be automatically performed, do not introduce artificial intervention, it is thus eliminated that the answering of hand-designed feature Miscellaneous degree, has wide applicability simultaneously；

(4) the feature extracted has extraordinary robustness to drift, can effectively improve the gas detecting under drift and knowledge Other accuracy rate, extends the service life of equipment.

Accompanying drawing explanation

Fig. 1 is the medical science Gas Distinguishing Method schematic flow sheet of the embodiment of the present invention.

Fig. 2 is the stack autoencoder network for medical science gas identification of the embodiment of the present invention.

Fig. 3 is the automatic coding machine comprising a hidden layer of the embodiment of the present invention.

Detailed description of the invention

Below in conjunction with the accompanying drawings embodiments of the invention are described further.

The substantially flow process of the Gas Distinguishing Method of the present invention is as shown in Figure 1:

Step 1. data normalization, is provided with m sample, by each sample by following form tissue, v=[s₁,s₂..·, s_t], wherein, s_iBeing i-th frequency response values, one has t response value, whole gas data collection and corresponding label can be with table It is shown as:

V = [v_{1}^{T}, v_{2}^{T}, ..., v_{i}^{T}, ..., v_{m}^{T}]

Y=[y₁,y₂,…,y_i,…,y_m]^T

T represents the transposition of vector, the i-th row v in matrix V_iRepresenting i-th sample, in Y, i-th element is corresponding sample Class label；

Utilize formulaData set is normalized to [0,1],

Wherein, V_i,jRepresenting the i-th frequency response values of jth sample, L is normalized lower bound, and its value is 0, and U is for returning One upper bound changed, its value is 1, max_iAnd min_iFor the maximum and minimum value of a line every in matrix, the data set a after normalization⁽⁰⁾Represent.

Step 2. pre-training stack autoencoder network, in described stack autoencoder network, v, h and y represent input respectively Layer, hidden layer and output layer, W⁽ⁱ⁾For connecting the weight matrix of each layer, b⁽ⁱ⁾Bias vector for hidden layer.

The present invention uses network structure similar to Figure 2, according to the difference of specific tasks, thus it is possible to vary the number of plies of network And the number of unit of each layer, therefore corresponding parametric form also can change.

Such network often level is very deep, and parameter is various, it is difficult to directly train, therefore initially with the method for pre-training Successively train the parameter of each layer.Compared with random initializtion, pre-training can make each layer parameter be positioned in parameter space preferably Position on.

Except softmax layer is used for classifying, in network, remaining part can be regarded as several single hidden layer automatic coding machine Stacking, wherein the output of preceding layer is connected with the input of later layer.This automatic coding machine is by reconstruct (representing with symbol ^) Input obtains the excitation of Hidden unit, and as the character representation being originally inputted, as shown in Figure 3.

Each automatic coding machine only retains the parameter of coding layer after having trained, i.e. W and b, as stack autoencoder network The initial parameter of middle corresponding level, detailed process is as follows:

J = \frac{1}{2 m} \underset{i}{Σ} (v_{i} - {\hat{v}}_{i}) + \frac{λ}{2} \underset{i}{Σ} \underset{j}{Σ} {W_{i j}}^{2} + β \underset{j}{Σ} [ρ l o g \frac{ρ}{p_{j}} + (1 - ρ) l o g \frac{1 - ρ}{1 - p_{j}}];

Wherein, Section 1 is reconstructed error item, represents input and the difference degree of output, wherein, v_iRepresent step 1 normalizing I-th input sample after change,Represent sample v_iBy output at output layer after network；Section 2 is referred to as weight attenuation term, It is the amplitude in order to reduce weight, prevents over-fitting, wherein, W_ijRepresent current layer jth unit and next layer of i-th unit it Between weights, Section 3 is sparse penalty term, wherein, p_jRepresent the average activation of Hidden unit j, λ, β and ρ to be and preset Parameter, this purpose is the average activation making all unit of hidden layer close to several ρ the least, say, that the most few Number Hidden unit is activated.M is number of samples, and J represents the object function of first automatic coding machine.

Use gradient descent method optimization object function, wherein during iteration, need to calculate the partial derivative of each parameter, this calculating Process is completed by back-propagation algorithm (backpropagation).

For l=n-1, n-2 ..., each layer of 2, calculate:

Calculating local derviation numerical value:

WhereinWith(W, b) to W all to represent J⁽ⁱ⁾Local derviation,WithAll represent (W, b) to b for J⁽ⁱ⁾Local derviation.

Step 2.1.4. undated parameter W⁽ⁱ⁾,b⁽ⁱ⁾, Wherein, α is learning rate.

Step 2.1.5. repetition step 2.1.2, to step 2.1.4, is gradually reduced the value of object function, until the threshold set Value.Now can obtain encoding layer parameter (W, b) and the parameter of decoding layer

Step 2.2. abandons decoding layer after having trainedBy the parameter of coding layer, (W, b) as stack own coding net The initial parameter of corresponding level, i.e. W in network⁽¹⁾=W, b⁽¹⁾=b.

Step 2.3. calculates the hidden layer excitation of current automatic coding machine: a⁽¹⁾=σ (W⁽¹⁾a⁽⁰⁾+b⁽¹⁾)。

Step 2.4. is at excitation a⁽¹⁾The upper training second layer, i.e. second automatic coding machine, wherein, first automatic encoding The hidden layer of machine is as the input layer of second automatic coding machine, and training process is identical with the process of training ground floor, but input Become a⁽¹⁾.Train initial parameter W obtaining the network second layer⁽²⁾, b⁽²⁾And hidden layer excitation a⁽²⁾。

Step 2.5, for third layer to n-th layer, repeats the process of step 2.1 to 2.4, can obtain each hidden layer Initial parameter, and finally give the excitation a of the n-th hidden layer⁽ⁿ⁾, the excitation of this layer also serves as the defeated of softmax layer simultaneously Enter, be denoted as a_S；Automatic coding machine in the present embodiment is 3 layers.

It is that Logistics returns the popularization in many classification problems that softmax returns.Convenient in order to represent, divide with x and θ Do not represent a_SAnd W_S, and assume total k kind.For i-th sample, it was predicted that the class obtained is designated as the probability of jth class and is:

P (y_{i} = j | x_{i}; θ) = \frac{{expθ}_{j}^{T} x_{i}}{Σ_{l = 1}^{k} {expθ}_{l}^{T} x_{i}}

Wherein,Represent the jth row in θ, represent and connect the row of weights between jth output unit and all input blocks Vector, l is constant variables and 1≤l≤k, and k is the input a of softmax layer_SAnd initial parameter W of softmax grader_S's Classification number, x_iInput value for the softmax layer of i-th sample.Final output is a probability column vector P, jth component Represent that forecast sample is judged as the probability of jth classification, utilize and minimize loss function to train weight matrix θ:

J (θ) = - \frac{1}{m} [Σ_{i = 1}^{m} Σ_{j = 1}^{k} 1 {y_{i} = j} \log P (y_{i} = j)] + \frac{λ}{2} Σ_{i = 1}^{m} Σ_{j = 1}^{n} {θ_{i j}}^{2}

Wherein, logP (y_i=j) represent the natural logrithm of a certain probit,It is indicator function, when the condition in bracket Being 1 for true time value, no person is 0；This loss function is a strict convex function, uses gradient decline or lbfgs etc. Optimized algorithm can be in the hope of globally optimal solution.M is number of samples, and n is the number of plies of automatic coding machine.

As long as connecting the parameter in the middle of two-layer here is all weight matrix, W_SI.e. θ is the weight square connecting last two-layer Battle array.

Utilize in the present embodiment and minimize loss function to train weight matrix θ detailed process as follows:

Step 2.6.1 random initializtion parameter matrix θ；

Step 2.6.2 directly calculates the derivative of J (θ), wherein, θ_jJth row in representing matrix；

{&dtri;}_{θ_{j}} J (θ) = - \frac{1}{m} Σ_{i = 1}^{m} [x_{i} (1 {y_{i} = j} - P (y_{i} = j | x_{i}; θ))] + {λθ}_{j}

Step 2.6.3 undated parameter θ:Wherein α is learning rate；Represent that J (θ) is right θ_jLocal derviation；

Step 2.6.4 repetition step 2.6.2, to step 2.6.3, is gradually reduced the value of J (θ), until the threshold value set, this Time the θ that obtains be last weight matrix, namely W_S。

Step 3 is finely tuned, and i.e. network is regarded as an entirety, calculates the partial derivative of each layer parameter with back propagation, then uses Gradient descent method iteration optimization.

After pre-training completes, in network, the initial parameter of every layer determines that, at this moment needs to carry out all parameters once Fine setting, to improve the classification capacity of network.The process of fine setting is that network is regarded as an entirety, calculates each layer ginseng with back propagation The partial derivative of number, then uses gradient descent method iteration optimization.At this moment network is no longer to reconstruct, therefore object function with The object function of softmax layer is identical, and wherein, softmax layer regards additional one layer as, individual processing, and each hidden layer Optimization process is essentially identical with described in step 2.1.

Detailed process is as follows:

Step 3.4. for l=n-1, n-2 ..., each layer of 2, calculate: δ^(l)=((W^(l))^Tδ^(l+1))·σ′(z^(l))；

The partial derivative of the step 3.5. each hidden layer of calculating:

\begin{matrix} {W_{S}}^{'} = W_{S} - α (\frac{1}{m} Σ {&dtri;}_{W_{S}} J + λ θ) \\ W^{{(i)}^{'}} = W^{(i)} - α (\frac{1}{m} Σ {&dtri;}_{W^{(i)}} J) \\ b^{{(i)}^{'}} = b^{(i)} - α (\frac{1}{m} Σ {&dtri;}_{b^{(i)}} J) \end{matrix};

Step 4.1. is by forecast sample v_pNormalize to [0,1]；

The core content of the present invention is exactly to devise to adapt to the network structure that medical science gas signal processes, and uses degree of depth study Process patient's gas data of Electronic Nose sampling, thus automatically extract more general, for sensor drift more robust Feature, complete the most simply and effectively gas detecting with identify this task, this is requiring the medical science of accuracy and real-time Field has great using value undoubtedly.

Claims

1. a medical science Gas Distinguishing Method based on degree of depth study, specifically comprises the following steps that

Step 1. data normalization, is provided with m sample, by each sample by following form tissue, v=[s₁,s₂...,s_t], its In, s_iBeing i-th frequency response values, one has t response value, whole gas data collection and corresponding label can be expressed as:

V = [v_{1}^{T}, v_{2}^{T}, ..., v_{i}^{T}, ..., v_{m}^{T}]

Y=[y₁,y₂,…,y_i,…,y_m]^T

T represents the transposition of vector, the i-th row v in matrix V_iRepresenting i-th sample, in Y, i-th element is the classification of corresponding sample Label；

Utilize formulaData set is normalized to [0,1],

Wherein, v_i,jRepresenting the i-th frequency response values of jth sample, L is normalized lower bound, and its value is 0, and U is normalization The upper bound, its value is 1, max_iAnd min_iFor the maximum and minimum value of a line every in matrix, the data set a after normalization⁽⁰⁾Table Show；

Step 2. pre-training stack autoencoder network, in described stack autoencoder network, v, h and y represent input layer respectively, hidden Hide layer and output layer, W⁽ⁱ⁾For connecting the weight matrix of each layer, b⁽ⁱ⁾Bias vector for hidden layer；

J = \frac{1}{2 m} \underset{i}{Σ} (v_{i} - {\hat{v}}_{i}) + \frac{λ}{2} \underset{i}{Σ} \underset{j}{Σ} {W_{i j}}^{2} + β \underset{j}{Σ} [ρ l o g \frac{ρ}{p_{j}} + (1 - ρ) l o g \frac{1 - ρ}{1 - p_{j}}];

Wherein, Section 1 is reconstructed error item, represents input and the difference degree of output, wherein, v_iAfter representing step 1 normalization I-th input sample,Represent sample v_iBy output at output layer after network；Section 2 is referred to as weight attenuation term, be for Reduce the amplitude of weight, prevent over-fitting, wherein, W_ijRepresent between current layer jth unit and next layer of i-th unit Weights, Section 3 is sparse penalty term, wherein, p_jRepresent the average activation of Hidden unit j, λ, β and ρ and be ginseng set in advance Number；M is number of samples, and J represents the object function of first automatic coding machine；

Step 2.1.1. random initializtion parameter W⁽ⁱ⁾、b⁽ⁱ⁾, initialize matrix or the vector of complete zero, i.e. Δ W⁽ⁱ⁾=0, Δ b⁽ⁱ⁾= 0；

Step 2.1.2. makesFor each sample, back-propagation algorithm is utilized to calculate partial derivativeDetailed process is as follows:

Feedforward calculates, and obtains each layer excitation a⁽ⁱ⁾, computing formula is a⁽ⁱ⁾=σ (W⁽ⁱ⁾a^(i-1)+b⁽ⁱ⁾), wherein, Being sigmoid function, output area is [0,1]；

For output layer, calculate residual error: δ⁽ⁿ⁾=-(v-a⁽ⁿ⁾)·σ′(z⁽ⁿ⁾), wherein, " " represents dot product, wherein z⁽ⁿ⁾ =W^(n-1)a^(n-1)+b^(n-1), σ ' represents the derivative of σ (x)；

For l=n-1, n-2 ..., each layer of 2, calculate: δ^(l)=((W^(l))^Tδ^(l+1))·σ′(z^(l))；

Calculating local derviation numerical value:

WhereinWith(W, b) to W all to represent J⁽ⁱ⁾Local derviation,WithAll represent J (W, B) to b⁽ⁱ⁾Local derviation；

Step 2.1.4. undated parameter W⁽ⁱ⁾,b⁽ⁱ⁾,Wherein, α is learning rate；

Step 2.1.5. repetition step 2.1.2, to step 2.1.4, is gradually reduced the value of object function, until the threshold value set, To coding layer parameter (W, b) and the parameter of decoding layer

Step 2.2. abandons decoding layer after having trainedBy the parameter of coding layer, (W, b) as in stack autoencoder network The initial parameter of corresponding level, i.e. W⁽¹⁾=W, b⁽¹⁾=b；

Step 2.4. is at excitation a⁽¹⁾The upper training second layer, i.e. second automatic coding machine, wherein, first automatic coding machine Hidden layer is as the input layer of second automatic coding machine, and training process is identical with the process of training ground floor, but input becomes a⁽¹⁾, trained initial parameter W obtaining the network second layer⁽²⁾, b⁽²⁾And hidden layer excitation a⁽²⁾；

Step 2.5, for third layer to n-th layer, repeats the process of step 2.1 to 2.4, can obtain at the beginning of each hidden layer Beginning parameter, and finally give the excitation a of the n-th hidden layer⁽ⁿ⁾, the excitation of this layer also serves as the input of softmax layer simultaneously, It is denoted as a_S；

The a that step 2.6. step 2.5 obtains_SWith last layer of label Y training network, i.e. softmax grader, obtain Initial parameter W of later layer_S；

A is represented respectively with x and θ_SAnd W_S, and assume total k kind, for i-th sample, it was predicted that the class obtained is designated as jth The probability of class is:

P (y_{i} = j | x_{i}; θ) = \frac{{expθ}_{j}^{T} x_{i}}{Σ_{l = 1}^{k} {expθ}_{l}^{T} x_{i}}

Wherein,Represent the jth row in θ, represent connect between jth output unit and all input blocks the row of weights to Amount, k is the input a of softmax layer_SAnd initial parameter W of softmax grader_SClassification number；Final output is one Probability column vector P, jth representation in components forecast sample is judged as the probability of jth classification, utilizes and minimizes loss function Training weight matrix θ:

J (θ) = - \frac{1}{m} [Σ_{i = 1}^{m} Σ_{j = 1}^{k} 1 {y_{i} = j} \log P (y_{i} = j)] + \frac{λ}{2} Σ_{i = 1}^{m} Σ_{j = 1}^{n} {θ_{i j}}^{2}

Wherein, logP (y_i=j) represent a certain probit P (y_i=j) natural logrithm,It is indicator function, when in bracket Condition be true time value be 1, no person is 0；M is number of samples, and n is the number of plies of automatic coding machine；

Step 3.2. calculates softmax layer parameter W_SPartial derivative:Wherein, P is that step 2.6 calculates The conditional probability vector obtained；

The partial derivative of the step 3.5. each hidden layer of calculating:

\begin{matrix} {W_{S}}^{'} = W_{S} - α (\frac{1}{m} Σ {&dtri;}_{W_{S}} J + λ θ) \\ W^{{(i)}^{'}} = W^{(i)} - α (\frac{1}{m} Σ {&dtri;}_{W^{(i)}} J) \\ b^{{(i)}^{'}} = b^{(i)} - α (\frac{1}{m} Σ {&dtri;}_{b^{(i)}} J) \end{matrix};

Wherein(W, b) to W to represent J_SLocal derviation, W_SFor the initial parameter of softmax grader,Expression J (W, b) To W⁽ⁱ⁾Local derviation,(W, b) to b to represent J⁽ⁱ⁾Local derviation；Step 3.7. repeat the above steps, reduces target by iteration The value of function, until the threshold value set；

Step 4.1. is by forecast sample v_pNormalize to [0,1]；

Step 4.3. is according to probability calculation formula design conditions probability vector P of step 2.5, and wherein, maximum component is corresponding Classification is prediction classification belonging to this sample.

Medical science Gas Distinguishing Method based on degree of depth study the most according to claim 1, it is characterised in that in step 2.6 Utilization minimizes loss function to train weight matrix θ detailed process as follows:

Step 2.6.1 random initializtion parameter matrix θ；

{&dtri;}_{θ_{j}} J (θ) = - \frac{1}{m} Σ_{i = 1}^{m} [x_{i} (1 {y_{i} = j} - P (y_{i} = j | x_{i}; θ))] + {λθ}_{j}

Step 2.6.3 undated parameter θ:Wherein, α is learning rate；Represent that J (θ) is to θ_j's Local derviation；

Step 2.6.4 repetition step 2.6.2, to step 2.6.3, is gradually reduced the value of J (θ), until the threshold value set, now To θ be last weight matrix, namely W_S。