CN108171319A - The construction method of the adaptive depth convolution model of network connection - Google Patents

The construction method of the adaptive depth convolution model of network connection Download PDF

Info

Publication number
CN108171319A
CN108171319A CN201711268262.6A CN201711268262A CN108171319A CN 108171319 A CN108171319 A CN 108171319A CN 201711268262 A CN201711268262 A CN 201711268262A CN 108171319 A CN108171319 A CN 108171319A
Authority
CN
China
Prior art keywords
value
formula
loss function
weight
weight vectors
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201711268262.6A
Other languages
Chinese (zh)
Inventor
田青
张文强
孔勇
张玉飞
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University of Information Science and Technology
Original Assignee
Nanjing University of Information Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University of Information Science and Technology filed Critical Nanjing University of Information Science and Technology
Priority to CN201711268262.6A priority Critical patent/CN108171319A/en
Publication of CN108171319A publication Critical patent/CN108171319A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks

Abstract

The construction method of the adaptive depth convolution model of network connection provided by the invention, includes the following steps:Step 1:Weight vectors in convolutional neural networks are orthogonalized;Step 2:The deletion for according to norm connect between layers in convolutional neural networks;Step 3:Build activation primitive.The present invention reduces data dependence by weight vectors orthogonalization;The problem of carrying out the deletion that connects between layers in convolutional neural networks based on p norms, effectively reduce over-fitting in depth convolution model building process, and it is adaptive to realize network connection;Activation primitive summary data information as possible under the premise of parameter scale is not changed is built, improves the accuracy for establishing depth convolution model.

Description

The construction method of the adaptive depth convolution model of network connection
Technical field
The present invention relates to technical field of data processing more particularly to a kind of structures of the adaptive depth convolution model of network connection Construction method.
Background technology
With the continuous development of information technology and popularizing for electronic equipment, people are increasingly dependent on internet, while Mass data is left on the internet.With the propulsion and development of deep learning research, people start using deep learning mould Type is solving the problems in reality.However, as huge data volume and complicated reality, deep learning model is in training Problems are often generated in the process, such as:Model parameter is excessive in training process, and data redundancy is serious, and the training time is significantly Increase etc..These problem parts derive from the not perfect part in part of traditional neural network, such as traditional neural network model In full articulamentum the over-fitting situation of model is eliminated using the method for random beta pruning, but this process has randomness, can Significant data can be lost, without certain robustness;Processing mode is too simple in sample data processing procedure, may Lose mass data information.
The optimisation strategy of current depth convolution model (Convolutional Neural Networks, CNN) is in data In terms of processing, it has been suggested that towards the convolution kernel redundancy eliminating method of convolutional neural networks, this method exists for convolutional neural networks A large amount of storages and computing resource needed for training stage, using a kind of improved redundancy convolution kernel dispelling tactics, and then reduce mould Type training expense.
Meanwhile propose deep layer convolutional network in the prior art and extract feature usually using the convolution kernel for crossing parametrization, and This feature extraction mode cause different convolution internuclear there are different degree of rarefications, the weight parameter of part convolution kernel may be excessively It is sparse, it is limited to the performance boost of model.Therefore, the prior art proposes to filter the higher volume of degree of rarefication according to certain threshold value Product core, and then CNN models are simplified, efficiency is improved, specific method is:The degree of rarefication of convolution kernel is joined according to the weight of residing convolutional layer It counts to define.For convolutional layer l, MlThe mean value of all convolution kernel weight parameter absolute values in l layers is represented, n-th in convolutional layer l The degree of rarefication of convolution kernel is Sl(n).If some weights of convolution kernel n are less than the mean value of weight in l layers, then Sl(n) it can approach In 1, this also means that current convolution kernel compares other convolution kernels more redundancy.For such case, the prior art propose with Lower two methods carry out beta pruning using degree of rarefication to the convolution kernel of redundancy.
(1) hierarchical optimization algorithm:All convolution kernels in same convolutional layer are arranged according to degree of rarefication descending;To be deleted Convolution nuclear volume is determined by about subtracting coefficient;
(2) gradient optimal method:One regression model, study about subtracting coefficient r and CNN networks are constructed based on neural network Relationship between performance P.
In terms of activation primitive is improved, the prior art proposes:When building artificial neural network, neuron is usually by activating Unit represents that for calculating the output of activation unit, wherein ReLU is widely used activation primitive f, and has multiple improvement, ReLU letters Number easily calculates and without carrying out pre-training to neural network, and can accelerate training, but ReLU activation unit is in the training process, It is very fragile, and be probably killed, because a larger gradient, which flows into ReLU units, may lead to the neural unit pair It will not all be activated again in any data point.The it is proposed of LReLU solves the fragile sex chromosome mosaicism of ReLU, but accuracy rate is compared with RelU There is decline.The neuron activation functions AReLU proposed in the prior art, has high-accuracy and high robust concurrently, and accuracy rate also has It is promoted, the function of AReLU is defined as follows shown in formula (1):
Attached drawing 1 is AReLU functions schematic diagram in the prior art.As shown in Figure 1, in AReLU functions, when positive negative region has During one identical parameter, effect is unsatisfactory, if a is smaller, can cause positive region (x >=0) restrain it is slack-off, if a compared with Greatly, then unilateral inhibition principle can be violated, and generates concussion.Therefore the prior art is again improved AReLU activation primitives so that AReLU is each at (x >=0) and x < 0, and there are one hyper parameters, control the gradient of positive region and negative region, i.e. logReLU letters respectively Number, attached drawing 2 are logReLU functions schematic diagrames in the prior art.The function of logReLU functions is defined as follows shown in formula (2):
Although the method for a variety of deep learnings has been proposed in the prior art, due to the spy of deep learning itself Point needs mass data training pattern, Caffe (Convolutional Architecture for Fast now Feature Embedding) there are still some it is inevitable the problem of, such as:The processing of training sample data, model foundation process And there are still many problems for algorithm, model solution and application etc..
Firstly, for training sample data process problem, what is mainly studied due to deep learning is all more complicated ask Topic, it is mostly the huge higher-dimension problem of the information content such as image/video processing, natural language processing to be related to field.Therefore, to sample The processing of data and the result of the abstract model and model solution for directly affecting foundation.It is activated by layer (Activivation Layers for), existing common activation primitive has an activation primitives such as ReLu, Sigmoid, at the same other scholars propose PReLu, The activation primitives such as RReLu, but these activation primitives perfect cannot all include the information of sample data, lose bulk information so that There is error in model foundation, leads to the mistake of model solution.
Secondly, for model foundation process and algorithmic issue, what traditional neural network loss layer (Loss Layers) used Be mostly SGD (Stochastic Gradient Descent) method, this method by continuous iteration ask the minimum value of loss come Model is established to summary data information.But this method is there are certain defect, when higher-dimensions such as processing picture, video, natural languages Quantity of parameters can be generated during information, can seriously drag slow model foundation and the speed of solution a large amount of after parameter reaches a certain scale The quality of model is also established in influence by redundancy.
In addition, for model solution and application problem, in full articulamentum (inner product layer), in order to keep away Exempt from over-fitting problem, i.e. model too relies on training sample, removes the shadow of certain parameter using the method for random beta pruning mostly It rings, but random beta pruning has certain randomness, as a result, it is uncontrollable, if will be to model when cutting off more important parameter Solving result affects greatly, and directly affects the accuracy of solution.And this method robustness is poor, has certain risk.
Invention content
The present invention provides a kind of construction method of the adaptive depth convolution model of network connection, to solve existing depth The problem of learning method data redundancy is serious improves the accuracy and efficiency of depth convolution model structure.
To solve the above-mentioned problems, the present invention provides a kind of structure sides of the adaptive depth convolution model of network connection Method includes the following steps:
Step 1:Weight vectors in convolutional neural networks are orthogonalized, specific calculation formula such as following formula (1), (2) shown in:
wi=vi (1)
In formula, vi、vjRespectively i-th of weight vectors of same layer neural network, j-th of weight vectors, wiIt is viWith vj Corresponding weight vectors, w after orthogonalizationjIt is vjWith viCorresponding weight vectors after orthogonalization;
Step 2:According to the deletion that norm connected between layers in convolutional neural networks, specific calculation formula As shown in following formula (3):
In formula, wiAnd wjIt is i-th of the weight vector and j-th of weight vector for belonging to one layer of neural network,It is The p norm values of i-th of weight vector, after the update of each weight vector, will the weight vector p norm values and this layer its Its arbitrary weight vector wjP norm values be added, if that they are added and for 0, by wiWeight vector becomes null vector;
Step 3:Activation primitive is built, shown in the activation primitive such as following formula (4):
F (x)=sigmoid (x)+λ (4)
In formula, the calculating step of λ is as follows:The numerical value λ of an arbitrary size is set to λ first0, the value of counting loss function y1;Then a variable quantity ε is increased to λ, continues the value y of counting loss function2If y2More than y1, then continue to increase the value of λ, and The value of counting loss function until the value of current loss function is less than the value of a preceding loss function, then stops iteration, at this time λ It is worth for local optimum;If y2Less than y1, then persistently reduce the value of λ, and the value of counting loss function, until current loss function Value be less than the preceding loss function being once calculated value, then stop iteration, at this time λ value be local optimum;Office will be used as The λ value of portion's optimal value substitutes into formula (4) and is calculated.
The construction method of the adaptive depth convolution model of network connection provided by the invention, is subtracted by weight vectors orthogonalization Lacked data dependence, reduced the parameter scale in depth convolution model, improve depth convolution model training speed with Network efficiency;Based on the deletion that p norms connected between layers in convolutional neural networks, depth convolution mould is effectively reduced In type building process the problem of over-fitting, and realize that network connection is adaptive, i.e., so-called " data-driven " makes finally to build Vertical depth convolution model more comprehensively, it is more credible;Activation primitive proposed by the present invention is by handling data sample, not Change summary data information as possible under the premise of parameter scale, improve the accuracy for establishing depth convolution model.
Description of the drawings
Attached drawing 1 is AReLU functions schematic diagram in the prior art;
Attached drawing 2 is logReLU functions schematic diagram in the prior art;
Attached drawing 3 is the flow of the construction method of the adaptive depth convolution model of network connection in the specific embodiment of the invention Figure;
Attached drawing 4 is McCulloch-Pitts model schematics in the prior art;
Attached drawing 5A is weight connection diagram between layers in traditional neural network;
Attached drawing 5B is in the prior art by the connection diagram in neural network after sparse interaction between layers;
Attached drawing 6A, 6B are sigmoid function schematic diagrames;
Attached drawing 7 is the frame diagram of the adaptive depth convolution model of the network connection built in the specific embodiment of the invention.
Specific embodiment
Below in conjunction with the accompanying drawings to network connection provided by the invention the construction method of adaptive depth convolution model it is specific Embodiment elaborates.
The construction method for the adaptive depth convolution model of network connection that present embodiment provides, attached drawing 3 is this hair The flow chart of the construction method of the adaptive depth convolution model of network connection in bright specific embodiment.As shown in figure 3, this is specific The construction method for the adaptive depth convolution model of network connection that embodiment provides, includes the following steps:
Step 1:Weight vectors in convolutional neural networks are orthogonalized, specific calculation formula such as following formula (5), (6) shown in:
wi=vi (5)
In formula, vi、vjRespectively i-th of weight vectors of same layer neural network, j-th of weight vectors, wiIt is viWith vj Corresponding weight vectors, w after orthogonalizationjIt is vjWith viCorresponding weight vectors after orthogonalization.
The characteristic that the basic unit of convolutional neural networks --- neuron has kind of a multi input, singly exports, wherein classical Neuron models McCulloch-Pitts models are constructed from information processing viewpoint.Attached drawing 4 is in the prior art McCulloch-Pitts model schematics.As shown in figure 4, for j-th of neuron, receive the input of other multiple neurons Signal xi.Each synaptic strength represents with real coefficient, wijIt is weighted value of i-th of neuron to j-th of neuron operation.Utilize spy Determine operation to combine the effect of input signal, provide their total effect, referred to as " net input ", with NetjOr IjIt represents.Only Expression formula is inputted there are many type, wherein, a kind of simplest form is linear weighted function summation, i.e.,:
Netj=∑ wijxi (7)
Net input action causes the state change of neuron j, and the output y of neuron jjIt is the function of its current state. The mathematic(al) representation of McCulloch-pitts models is:
yj=sgn (∑ wijxjj) (8)
In formula (8), θjFor threshold value, sgn is sign function, when net input is more than threshold value, yj+ 1 output is taken, otherwise is -1 Output.Several features that the artificial neural network formed would indicate that human brain, artificial neuron are connected with each other using a large amount of neurons Network also has preliminary adaptive and self organization ability.Change synapse weight w in study or training processijValue, to adapt to The requirement of ambient enviroment.
Each layer of convolutional Neural neural network has a weight vectors, and weight vectors determine the defeated of this layer of neural network Go out, and convolutional neural networks can reflect the feature of data by the weight vectors of training data model well.This A little weight vectors are for the precision important in inhibiting of entire model, and therefore, present embodiment uses formula (5) and public affairs Gram Schmidt orthogonalization process shown in formula (6) is carrying out a kind of improvement, i.e. weight vectors just to the fitting of weight vectors Friendshipization.The orthogonalization main purpose of weight vectors is to reduce the correlation between the connection of neuron weight.If (wi)T*wj=0, I-th of the weight vectors represented in same layer neural network are orthogonal with j-th of weight vectors in same layer, at this time the two to The correlation of amount is minimum so that the correlation between weight vectors is reduced, so by orthogonalization between parameter, can be made Obtain the feature that parameter more effectively reflects data.Correlation is reduced by reduction, to reach the more fully anti-of data characteristics It reflects.
Step 2:According to the deletion that norm connected between layers in convolutional neural networks, specific calculation formula As shown in following formula (9):
In formula, wiAnd wjIt is i-th of the weight vector and j-th of weight vector for belonging to one layer of neural network,It is The p norm values of i-th of weight vector, after the update of each weight vector, will the weight vector p norm values and this layer its Its arbitrary weight vector wjP norm values be added, if that they are added and for 0, by wiWeight vector becomes null vector.
Norm (norm) is a kind of basic conception in mathematics.Norm is defined on normed linear space in functional analysis In, and meet some requirements, i.e. 1. nonnegativities;It is 2. homogeneity;3. triangle inequality.It is frequently used to measure some vector The length or size of each vector in space (or matrix).Common norm has as follows:
1- norms:||x||1=| x1|+|x2|+|x3|+...+|xn| (10)
2- norms:||x||2=(| x1|2+|x2|2+|x3|2+...+|xn|2)1/2 (11)
∞ norms:||x||=max (| x1|,|x2|,......,|xn|) (12)
Norm in the finite dimensional space has good property, is mainly reflected in following theorem:
Property 1:For any group of base of finite dimension normed linear space, norm is the coordinate of element (under this group of base) Continuous function.
Property 2 (Minkowski theorems):All norms of finite dimensional linear space are all of equal value.
Property 3 (Cauchy convergence principles):Finite dimensional linear space (pressing any norm) in real number field (or complex field) It must be complete.
Property 4:Sequence in finite dimension normed linear space by the convergent necessary and sufficient condition of coordinate be it by any norm all Convergence.
Attached drawing 5A is weight connection diagram between layers in traditional neural network.As shown in Figure 5A, each list The neuron and next layer of neuron of position can be all attached.This kind of connection mode can cause model following defect:
1. the neuron that single neuron is connected is excessive, the parameter that model needs are trained is excessive, and calculation amount is excessive.
2. connection number is excessively also possible to the situation for causing to be more easily trapped into local optimum during training neural network.
For this case, the dropout layers of convolutional neural networks take the mode of sparse interaction, that is, random choosing The connection selected between some neurons is deleted.Attached drawing 5B is in the prior art by neural network middle level after sparse interaction Connection diagram between layer.By will be compared between Fig. 5 A and Fig. 5 B, it can be seen that the company between neural net layer It connects number to greatly reduce, solves the defects of above-mentioned described connection is excessively caused, but the mode of sparse interaction is random right Connection between layers is removed, and lacks the connection deletion algorithm with theoretical foundation.
Present embodiment passes through norm meter according to the property that norm is the function that can be vector imparting length relation The length of vector is calculated, it, can be with by being zero by distance or being deleted less than the connection representated by the vector of some threshold value Effectively to the quantity of reduction parameter, the calculation amount of model is reduced, the side deleted is connect to weight at random with sparse interaction Method compares, and present embodiment is that purposefully weight connection is deleted, and can realize has choosing to what weight connected The reservation of selecting property so that the deletion of weight connection becomes controllable and is provided for theoretical foundation.
The deletion that present embodiment is connected by the weight based on norm reaches the mesh for reducing weight connection redundancy 's;And the weight vectors orthogonalization that step is a kind of, then main purpose is to realize the reduction of weight join dependency, though the two has Similarity is provided to the ability of enhancing neuroid study, but the two is to reach this in terms of two respectively Point, the two are applied to the efficiency that neural network learning can be fully promoted in neural network simultaneously.
Step 3:Activation primitive is built, shown in the activation primitive such as following formula (13):
F (x)=sigmoid (x)+λ (13)
In formula, the calculating step of λ is as follows:The numerical value λ of an arbitrary size is set to λ first0, the value of counting loss function y1;Then a variable quantity ε is increased to λ, continues the value y of counting loss function2If y2More than y1, then continue to increase the value of λ, and The value of counting loss function until the value of current loss function is less than the value of a preceding loss function, then stops iteration, at this time λ It is worth for local optimum;If y2Less than y1, then persistently reduce the value of λ, and the value of counting loss function, until current loss function Value be less than the preceding loss function being once calculated value, then stop iteration, at this time λ value be local optimum;Office will be used as The λ value of portion's optimal value substitutes into formula (13) and is calculated.
Neuron generates the rule of output signal by neuronal function function f (Activation under input signal effect Function it) provides, also referred to as activation primitive or transfer function, this is the external characteristics of neuron models.Activation primitive contains From input signal to net input, again to activation value, the final process for generating output signal.Due to combining net input, f function Effect, activation primitive directly determine data mapping mode, at the same different activation primitives because the difference of gradient can also influence it is anti- To propagation algorithm.F function is various informative, and the neural network of Various Functions is may be constructed using their different characteristics.
Neuronal function function common in the art has following several:
1. simple linear function
The continuous values of neuronal function function f, input x are weighted by connection matrix W and are generated output.
F (x)=x (14)
2. symmetrical hard-limiting function
A kind of nonlinear model of symmetrical hard-limiting function, output only take two-value.Such as+1 or -1 (or 1 and 0 be hard-limiting letter Number), when net input is more than a certain threshold θ, output takes+1, conversely, output takes -1, this effect can be by sign function table Show have at this time:
F (x)=sgn (x- θ) (15)
3.sigmoid functions (sigmoid function)
Attached drawing 6A, 6B are sigmoid function schematic diagrames.Neuron output be limited in it is continuous non-between two finite values Subtraction function, expression formula can be written as:
Sigmoid curves are made of hyperbolic tangent function in formula (16), and maxima and minima takes+1 and -1 respectively.Such as figure Shown in 6A.If hyperbolic functions translated up, that is, use a minor function i.e. unipolarity shape function instead.
Then curve maximum takes 1 and 0 respectively with minimum value, as shown in Figure 6B.
4. bipolarity activation primitive
The output area of general unipolarity S type activation primitives is 0~1, this is not best.In order to solve this problem may be used Input range is become ± 0.5, while S types activation primitive is biased, the output area of node is made to become [- 0.5,0.5], then Have
F (x)=- 1/2+1/ (1+e-x) (18)
In BP (Back Propagation) algorithm for using bipolarity activation primitive, convergence rate doubles a left side It is right.
By the above-mentioned description to conventional activation function it can be seen that the diversity and importance of activation primitive and one Influence of the good activation primitive for neural network learning efficiency, so it is always to promote nerve to find better activation primitive The important point of penetration of network.
Adaptive activation primitive and conventional activation function shown in formula (13) provided in the present embodiment are not It is with putting:The adaptive activation primitive of present embodiment mainly takes ReLu functions as basic structure, while at it In plus uncertain parameter, the value of these parameters by constantly input data and changes to parameter mainly by data-driven Generation, until parameter can well be fitted data.The parameter come is fitted to can be good at reflecting data characteristics.It is adaptive The variable in activation primitive is answered to generate variation with learning process so that study is more flexible, can more embody the adaptive of system Property.
Attached drawing 7 is the frame diagram of the adaptive depth convolution model of the network connection built in the specific embodiment of the invention. The neural network that present embodiment is established is as shown in fig. 7, ultimately form following optimization aim:
min soft max+λ1122 (19)
Wherein, softmax, Ω1, Ω2Formula be explained as follows:
Ω1=wi T.wj (21)
The value of softmax is by loss function loss (xy) and the sum of weight vectors and its amount of being connected to, Ω1To pass through power The correlation of weight vector orthogonalization removal weights, Ω2For based on p norms weight connect delete, it is therefore intended that removal weights it Between redundancy.Optimization aim can be rewritten as by formula (17), (18), (19):
The target to be optimized in the adaptive convolution model that formula (23) i.e. present embodiment is proposed, wherein Loss (xy) is the optimization aim of conventional model, i.e. loss function.λ1、λ2It is weight vectors orthogonalization respectively, based on p norms Weight connection delete shared parameter proportion, parameter setting is bigger, illustrates that corresponding optimization processing step constrains more apparent, wiFor I-th of weight vectors of neural network,The sum of p norms of all weight vectors for same layer neural network.
Traditional neural network model is:Convolutional layer+line rectification layer+pond layer+full articulamentum.Similarly, this is specific The adaptive depth convolution model of network connection that embodiment provides can be also based on upper strata, but in the core calculations of some layers In function and weight computations, the difference of present embodiment and traditional neural network model can be provided.
Weight vectors orthogonalization in present embodiment step 1 is connected with the weight based on p norms in step 2 Deletion is all that the calculating of weight vectors is optimized, and the calculating of weight vectors is happened at full articulamentum, therefore in full articulamentum Weight vectors calculating process in, add in the constraint added of present embodiment, the weight vectors being calculated can be more Add the feature for embodying data.
The adaptive activation primitive of present embodiment structure, activation primitive are located at line rectification layer, therefore, utilize this The adaptive activation primitive that specific embodiment proposes replaces traditional activation primitive, can reach better modelling effect.
Present embodiment realizes the optimization of full articulamentum first, that is, overcomes the randomness of dropout beta prunings, introduces The concept of " p norms adaptive model " weighs the significance level of connection to realize intelligent beta pruning with weight distance, increases data Validity and improve model accuracy;Secondly, in the adaptive design of activation primitive, current proposed is made a general survey of Activation primitive is all to manually adjust the design that the method for parameter carries out activation primitive, and present embodiment is automatic with model Activation primitive form is selected as break-through point, to find optimal activation primitive form of each stage;Finally, it in data processing, reduces In data volume and raising data validity, present embodiment proposes, using the orthogonal method for judging correlation of convolution kernel, to lead to Cross the accuracy and the efficiency of model for efficiently extracting data to ensure last model.
The above is only the preferred embodiment of the present invention, it is noted that for the ordinary skill people of the art Member, various improvements and modifications may be made without departing from the principle of the present invention, these improvements and modifications also should be regarded as Protection scope of the present invention.

Claims (1)

1. a kind of construction method of the adaptive depth convolution model of network connection, which is characterized in that include the following steps:
Step 1:Weight vectors in convolutional neural networks are orthogonalized, specific calculation formula such as following formula (1), (2) institute Show:
wi=vi (1)
In formula, vi、vjRespectively i-th of weight vectors of same layer neural network, j-th of weight vectors, wiIt is viWith vjIt is orthogonal Corresponding weight vectors, w after changejIt is vjWith viCorresponding weight vectors after orthogonalization;
Step 2:According to the deletion that norm connected between layers in convolutional neural networks, specific calculation formula is as follows Shown in formula (3):
In formula, wiAnd wjIt is i-th of the weight vector and j-th of weight vector for belonging to one layer of neural network,It is i-th The p norm values of weight vector, will be other of p norm values and this layer of the weight vector after the update of each weight vector Anticipate weight vector wjP norm values be added, if that they are added and for 0, by wiWeight vector becomes null vector;
Step 3:Activation primitive is built, shown in the activation primitive such as following formula (4):
F (x)=sigmoid (x)+λ (4)
In formula, the calculating step of λ is as follows:The numerical value λ of an arbitrary size is set to λ first0, the value y of counting loss function1;So A variable quantity ε is increased to λ afterwards, continues the value y of counting loss function2If y2More than y1, then continue to increase the value of λ, and calculate The value of loss function until the value of current loss function is less than the value of a preceding loss function, then stops iteration, λ value is at this time Local optimum;If y2Less than y1, then persistently reduce the value of λ, and the value of counting loss function, until the value of current loss function Less than the value of the preceding loss function being once calculated, then stop iteration, λ value is local optimum at this time;Part will be used as most The λ value of the figure of merit substitutes into formula (4) and is calculated.
CN201711268262.6A 2017-12-05 2017-12-05 The construction method of the adaptive depth convolution model of network connection Pending CN108171319A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201711268262.6A CN108171319A (en) 2017-12-05 2017-12-05 The construction method of the adaptive depth convolution model of network connection

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201711268262.6A CN108171319A (en) 2017-12-05 2017-12-05 The construction method of the adaptive depth convolution model of network connection

Publications (1)

Publication Number Publication Date
CN108171319A true CN108171319A (en) 2018-06-15

Family

ID=62524371

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201711268262.6A Pending CN108171319A (en) 2017-12-05 2017-12-05 The construction method of the adaptive depth convolution model of network connection

Country Status (1)

Country Link
CN (1) CN108171319A (en)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109002890A (en) * 2018-07-11 2018-12-14 北京航空航天大学 The modeling method and device of convolutional neural networks model
CN109214353A (en) * 2018-09-27 2019-01-15 云南大学 A kind of facial image based on beta pruning model quickly detects training method and device
CN109344888A (en) * 2018-09-19 2019-02-15 广东工业大学 A kind of image-recognizing method based on convolutional neural networks, device and equipment
CN110736707A (en) * 2019-09-16 2020-01-31 浙江大学 Spectrum detection optimization method for spectrum model transfer from master instruments to slave instruments
CN110874574A (en) * 2019-10-30 2020-03-10 平安科技(深圳)有限公司 Pedestrian re-identification method and device, computer equipment and readable storage medium
CN112598640A (en) * 2020-12-22 2021-04-02 哈尔滨市科佳通用机电股份有限公司 Water filling port cover plate loss detection method based on deep learning
TWI740726B (en) * 2020-07-31 2021-09-21 大陸商星宸科技股份有限公司 Sorting method, operation method and apparatus of convolutional neural network
US11366978B2 (en) 2018-10-23 2022-06-21 Samsung Electronics Co., Ltd. Data recognition apparatus and method, and training apparatus and method

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109002890A (en) * 2018-07-11 2018-12-14 北京航空航天大学 The modeling method and device of convolutional neural networks model
CN109344888A (en) * 2018-09-19 2019-02-15 广东工业大学 A kind of image-recognizing method based on convolutional neural networks, device and equipment
CN109214353A (en) * 2018-09-27 2019-01-15 云南大学 A kind of facial image based on beta pruning model quickly detects training method and device
CN109214353B (en) * 2018-09-27 2021-11-23 云南大学 Training method and device for rapid detection of face image based on pruning model
US11366978B2 (en) 2018-10-23 2022-06-21 Samsung Electronics Co., Ltd. Data recognition apparatus and method, and training apparatus and method
CN110736707A (en) * 2019-09-16 2020-01-31 浙江大学 Spectrum detection optimization method for spectrum model transfer from master instruments to slave instruments
CN110874574A (en) * 2019-10-30 2020-03-10 平安科技(深圳)有限公司 Pedestrian re-identification method and device, computer equipment and readable storage medium
WO2021082078A1 (en) * 2019-10-30 2021-05-06 平安科技(深圳)有限公司 Pedestrian re-recognition method and apparatus, computer device, and readable storage medium
TWI740726B (en) * 2020-07-31 2021-09-21 大陸商星宸科技股份有限公司 Sorting method, operation method and apparatus of convolutional neural network
CN112598640A (en) * 2020-12-22 2021-04-02 哈尔滨市科佳通用机电股份有限公司 Water filling port cover plate loss detection method based on deep learning
CN112598640B (en) * 2020-12-22 2021-09-14 哈尔滨市科佳通用机电股份有限公司 Water filling port cover plate loss detection method based on deep learning

Similar Documents

Publication Publication Date Title
CN108171319A (en) The construction method of the adaptive depth convolution model of network connection
CN109214566B (en) Wind power short-term prediction method based on long and short-term memory network
CN107688850B (en) Deep neural network compression method
CN104751842B (en) The optimization method and system of deep neural network
CN107688849A (en) A kind of dynamic strategy fixed point training method and device
Alaloul et al. Data processing using artificial neural networks
CN108549658B (en) Deep learning video question-answering method and system based on attention mechanism on syntax analysis tree
CN107679618A (en) A kind of static policies fixed point training method and device
CN107301864A (en) A kind of two-way LSTM acoustic models of depth based on Maxout neurons
CN110321361B (en) Test question recommendation and judgment method based on improved LSTM neural network model
CN107679617A (en) The deep neural network compression method of successive ignition
CN109829541A (en) Deep neural network incremental training method and system based on learning automaton
CN108427665A (en) A kind of text automatic generation method based on LSTM type RNN models
CN110222901A (en) A kind of electric load prediction technique of the Bi-LSTM based on deep learning
CN104636985A (en) Method for predicting radio disturbance of electric transmission line by using improved BP (back propagation) neural network
CN106970981B (en) Method for constructing relation extraction model based on transfer matrix
CN108596327A (en) A kind of seismic velocity spectrum artificial intelligence pick-up method based on deep learning
CN113392210A (en) Text classification method and device, electronic equipment and storage medium
CN110110372B (en) Automatic segmentation prediction method for user time sequence behavior
Shi et al. The prediction of character based on recurrent neural network language model
CN110222844A (en) A kind of compressor performance prediction technique based on artificial neural network
CN111382840B (en) HTM design method based on cyclic learning unit and oriented to natural language processing
CN105550748A (en) Method for constructing novel neural network based on hyperbolic tangent function
CN113157919A (en) Sentence text aspect level emotion classification method and system
CN109670169B (en) Deep learning emotion classification method based on feature extraction

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20180615

RJ01 Rejection of invention patent application after publication