CN108171319A - The construction method of the adaptive depth convolution model of network connection - Google Patents
The construction method of the adaptive depth convolution model of network connection Download PDFInfo
- Publication number
- CN108171319A CN108171319A CN201711268262.6A CN201711268262A CN108171319A CN 108171319 A CN108171319 A CN 108171319A CN 201711268262 A CN201711268262 A CN 201711268262A CN 108171319 A CN108171319 A CN 108171319A
- Authority
- CN
- China
- Prior art keywords
- value
- formula
- loss function
- weight
- weight vectors
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
Abstract
The construction method of the adaptive depth convolution model of network connection provided by the invention, includes the following steps:Step 1:Weight vectors in convolutional neural networks are orthogonalized;Step 2:The deletion for according to norm connect between layers in convolutional neural networks;Step 3:Build activation primitive.The present invention reduces data dependence by weight vectors orthogonalization;The problem of carrying out the deletion that connects between layers in convolutional neural networks based on p norms, effectively reduce over-fitting in depth convolution model building process, and it is adaptive to realize network connection;Activation primitive summary data information as possible under the premise of parameter scale is not changed is built, improves the accuracy for establishing depth convolution model.
Description
Technical field
The present invention relates to technical field of data processing more particularly to a kind of structures of the adaptive depth convolution model of network connection
Construction method.
Background technology
With the continuous development of information technology and popularizing for electronic equipment, people are increasingly dependent on internet, while
Mass data is left on the internet.With the propulsion and development of deep learning research, people start using deep learning mould
Type is solving the problems in reality.However, as huge data volume and complicated reality, deep learning model is in training
Problems are often generated in the process, such as:Model parameter is excessive in training process, and data redundancy is serious, and the training time is significantly
Increase etc..These problem parts derive from the not perfect part in part of traditional neural network, such as traditional neural network model
In full articulamentum the over-fitting situation of model is eliminated using the method for random beta pruning, but this process has randomness, can
Significant data can be lost, without certain robustness;Processing mode is too simple in sample data processing procedure, may
Lose mass data information.
The optimisation strategy of current depth convolution model (Convolutional Neural Networks, CNN) is in data
In terms of processing, it has been suggested that towards the convolution kernel redundancy eliminating method of convolutional neural networks, this method exists for convolutional neural networks
A large amount of storages and computing resource needed for training stage, using a kind of improved redundancy convolution kernel dispelling tactics, and then reduce mould
Type training expense.
Meanwhile propose deep layer convolutional network in the prior art and extract feature usually using the convolution kernel for crossing parametrization, and
This feature extraction mode cause different convolution internuclear there are different degree of rarefications, the weight parameter of part convolution kernel may be excessively
It is sparse, it is limited to the performance boost of model.Therefore, the prior art proposes to filter the higher volume of degree of rarefication according to certain threshold value
Product core, and then CNN models are simplified, efficiency is improved, specific method is:The degree of rarefication of convolution kernel is joined according to the weight of residing convolutional layer
It counts to define.For convolutional layer l, MlThe mean value of all convolution kernel weight parameter absolute values in l layers is represented, n-th in convolutional layer l
The degree of rarefication of convolution kernel is Sl(n).If some weights of convolution kernel n are less than the mean value of weight in l layers, then Sl(n) it can approach
In 1, this also means that current convolution kernel compares other convolution kernels more redundancy.For such case, the prior art propose with
Lower two methods carry out beta pruning using degree of rarefication to the convolution kernel of redundancy.
(1) hierarchical optimization algorithm:All convolution kernels in same convolutional layer are arranged according to degree of rarefication descending;To be deleted
Convolution nuclear volume is determined by about subtracting coefficient;
(2) gradient optimal method:One regression model, study about subtracting coefficient r and CNN networks are constructed based on neural network
Relationship between performance P.
In terms of activation primitive is improved, the prior art proposes:When building artificial neural network, neuron is usually by activating
Unit represents that for calculating the output of activation unit, wherein ReLU is widely used activation primitive f, and has multiple improvement, ReLU letters
Number easily calculates and without carrying out pre-training to neural network, and can accelerate training, but ReLU activation unit is in the training process,
It is very fragile, and be probably killed, because a larger gradient, which flows into ReLU units, may lead to the neural unit pair
It will not all be activated again in any data point.The it is proposed of LReLU solves the fragile sex chromosome mosaicism of ReLU, but accuracy rate is compared with RelU
There is decline.The neuron activation functions AReLU proposed in the prior art, has high-accuracy and high robust concurrently, and accuracy rate also has
It is promoted, the function of AReLU is defined as follows shown in formula (1):
Attached drawing 1 is AReLU functions schematic diagram in the prior art.As shown in Figure 1, in AReLU functions, when positive negative region has
During one identical parameter, effect is unsatisfactory, if a is smaller, can cause positive region (x >=0) restrain it is slack-off, if a compared with
Greatly, then unilateral inhibition principle can be violated, and generates concussion.Therefore the prior art is again improved AReLU activation primitives so that
AReLU is each at (x >=0) and x < 0, and there are one hyper parameters, control the gradient of positive region and negative region, i.e. logReLU letters respectively
Number, attached drawing 2 are logReLU functions schematic diagrames in the prior art.The function of logReLU functions is defined as follows shown in formula (2):
Although the method for a variety of deep learnings has been proposed in the prior art, due to the spy of deep learning itself
Point needs mass data training pattern, Caffe (Convolutional Architecture for Fast now
Feature Embedding) there are still some it is inevitable the problem of, such as:The processing of training sample data, model foundation process
And there are still many problems for algorithm, model solution and application etc..
Firstly, for training sample data process problem, what is mainly studied due to deep learning is all more complicated ask
Topic, it is mostly the huge higher-dimension problem of the information content such as image/video processing, natural language processing to be related to field.Therefore, to sample
The processing of data and the result of the abstract model and model solution for directly affecting foundation.It is activated by layer (Activivation
Layers for), existing common activation primitive has an activation primitives such as ReLu, Sigmoid, at the same other scholars propose PReLu,
The activation primitives such as RReLu, but these activation primitives perfect cannot all include the information of sample data, lose bulk information so that
There is error in model foundation, leads to the mistake of model solution.
Secondly, for model foundation process and algorithmic issue, what traditional neural network loss layer (Loss Layers) used
Be mostly SGD (Stochastic Gradient Descent) method, this method by continuous iteration ask the minimum value of loss come
Model is established to summary data information.But this method is there are certain defect, when higher-dimensions such as processing picture, video, natural languages
Quantity of parameters can be generated during information, can seriously drag slow model foundation and the speed of solution a large amount of after parameter reaches a certain scale
The quality of model is also established in influence by redundancy.
In addition, for model solution and application problem, in full articulamentum (inner product layer), in order to keep away
Exempt from over-fitting problem, i.e. model too relies on training sample, removes the shadow of certain parameter using the method for random beta pruning mostly
It rings, but random beta pruning has certain randomness, as a result, it is uncontrollable, if will be to model when cutting off more important parameter
Solving result affects greatly, and directly affects the accuracy of solution.And this method robustness is poor, has certain risk.
Invention content
The present invention provides a kind of construction method of the adaptive depth convolution model of network connection, to solve existing depth
The problem of learning method data redundancy is serious improves the accuracy and efficiency of depth convolution model structure.
To solve the above-mentioned problems, the present invention provides a kind of structure sides of the adaptive depth convolution model of network connection
Method includes the following steps:
Step 1:Weight vectors in convolutional neural networks are orthogonalized, specific calculation formula such as following formula (1),
(2) shown in:
wi=vi (1)
In formula, vi、vjRespectively i-th of weight vectors of same layer neural network, j-th of weight vectors, wiIt is viWith vj
Corresponding weight vectors, w after orthogonalizationjIt is vjWith viCorresponding weight vectors after orthogonalization;
Step 2:According to the deletion that norm connected between layers in convolutional neural networks, specific calculation formula
As shown in following formula (3):
In formula, wiAnd wjIt is i-th of the weight vector and j-th of weight vector for belonging to one layer of neural network,It is
The p norm values of i-th of weight vector, after the update of each weight vector, will the weight vector p norm values and this layer its
Its arbitrary weight vector wjP norm values be added, if that they are added and for 0, by wiWeight vector becomes null vector;
Step 3:Activation primitive is built, shown in the activation primitive such as following formula (4):
F (x)=sigmoid (x)+λ (4)
In formula, the calculating step of λ is as follows:The numerical value λ of an arbitrary size is set to λ first0, the value of counting loss function
y1;Then a variable quantity ε is increased to λ, continues the value y of counting loss function2If y2More than y1, then continue to increase the value of λ, and
The value of counting loss function until the value of current loss function is less than the value of a preceding loss function, then stops iteration, at this time λ
It is worth for local optimum;If y2Less than y1, then persistently reduce the value of λ, and the value of counting loss function, until current loss function
Value be less than the preceding loss function being once calculated value, then stop iteration, at this time λ value be local optimum;Office will be used as
The λ value of portion's optimal value substitutes into formula (4) and is calculated.
The construction method of the adaptive depth convolution model of network connection provided by the invention, is subtracted by weight vectors orthogonalization
Lacked data dependence, reduced the parameter scale in depth convolution model, improve depth convolution model training speed with
Network efficiency;Based on the deletion that p norms connected between layers in convolutional neural networks, depth convolution mould is effectively reduced
In type building process the problem of over-fitting, and realize that network connection is adaptive, i.e., so-called " data-driven " makes finally to build
Vertical depth convolution model more comprehensively, it is more credible;Activation primitive proposed by the present invention is by handling data sample, not
Change summary data information as possible under the premise of parameter scale, improve the accuracy for establishing depth convolution model.
Description of the drawings
Attached drawing 1 is AReLU functions schematic diagram in the prior art;
Attached drawing 2 is logReLU functions schematic diagram in the prior art;
Attached drawing 3 is the flow of the construction method of the adaptive depth convolution model of network connection in the specific embodiment of the invention
Figure;
Attached drawing 4 is McCulloch-Pitts model schematics in the prior art;
Attached drawing 5A is weight connection diagram between layers in traditional neural network;
Attached drawing 5B is in the prior art by the connection diagram in neural network after sparse interaction between layers;
Attached drawing 6A, 6B are sigmoid function schematic diagrames;
Attached drawing 7 is the frame diagram of the adaptive depth convolution model of the network connection built in the specific embodiment of the invention.
Specific embodiment
Below in conjunction with the accompanying drawings to network connection provided by the invention the construction method of adaptive depth convolution model it is specific
Embodiment elaborates.
The construction method for the adaptive depth convolution model of network connection that present embodiment provides, attached drawing 3 is this hair
The flow chart of the construction method of the adaptive depth convolution model of network connection in bright specific embodiment.As shown in figure 3, this is specific
The construction method for the adaptive depth convolution model of network connection that embodiment provides, includes the following steps:
Step 1:Weight vectors in convolutional neural networks are orthogonalized, specific calculation formula such as following formula (5),
(6) shown in:
wi=vi (5)
In formula, vi、vjRespectively i-th of weight vectors of same layer neural network, j-th of weight vectors, wiIt is viWith vj
Corresponding weight vectors, w after orthogonalizationjIt is vjWith viCorresponding weight vectors after orthogonalization.
The characteristic that the basic unit of convolutional neural networks --- neuron has kind of a multi input, singly exports, wherein classical
Neuron models McCulloch-Pitts models are constructed from information processing viewpoint.Attached drawing 4 is in the prior art
McCulloch-Pitts model schematics.As shown in figure 4, for j-th of neuron, receive the input of other multiple neurons
Signal xi.Each synaptic strength represents with real coefficient, wijIt is weighted value of i-th of neuron to j-th of neuron operation.Utilize spy
Determine operation to combine the effect of input signal, provide their total effect, referred to as " net input ", with NetjOr IjIt represents.Only
Expression formula is inputted there are many type, wherein, a kind of simplest form is linear weighted function summation, i.e.,:
Netj=∑ wijxi (7)
Net input action causes the state change of neuron j, and the output y of neuron jjIt is the function of its current state.
The mathematic(al) representation of McCulloch-pitts models is:
yj=sgn (∑ wijxj-θj) (8)
In formula (8), θjFor threshold value, sgn is sign function, when net input is more than threshold value, yj+ 1 output is taken, otherwise is -1
Output.Several features that the artificial neural network formed would indicate that human brain, artificial neuron are connected with each other using a large amount of neurons
Network also has preliminary adaptive and self organization ability.Change synapse weight w in study or training processijValue, to adapt to
The requirement of ambient enviroment.
Each layer of convolutional Neural neural network has a weight vectors, and weight vectors determine the defeated of this layer of neural network
Go out, and convolutional neural networks can reflect the feature of data by the weight vectors of training data model well.This
A little weight vectors are for the precision important in inhibiting of entire model, and therefore, present embodiment uses formula (5) and public affairs
Gram Schmidt orthogonalization process shown in formula (6) is carrying out a kind of improvement, i.e. weight vectors just to the fitting of weight vectors
Friendshipization.The orthogonalization main purpose of weight vectors is to reduce the correlation between the connection of neuron weight.If (wi)T*wj=0,
I-th of the weight vectors represented in same layer neural network are orthogonal with j-th of weight vectors in same layer, at this time the two to
The correlation of amount is minimum so that the correlation between weight vectors is reduced, so by orthogonalization between parameter, can be made
Obtain the feature that parameter more effectively reflects data.Correlation is reduced by reduction, to reach the more fully anti-of data characteristics
It reflects.
Step 2:According to the deletion that norm connected between layers in convolutional neural networks, specific calculation formula
As shown in following formula (9):
In formula, wiAnd wjIt is i-th of the weight vector and j-th of weight vector for belonging to one layer of neural network,It is
The p norm values of i-th of weight vector, after the update of each weight vector, will the weight vector p norm values and this layer its
Its arbitrary weight vector wjP norm values be added, if that they are added and for 0, by wiWeight vector becomes null vector.
Norm (norm) is a kind of basic conception in mathematics.Norm is defined on normed linear space in functional analysis
In, and meet some requirements, i.e. 1. nonnegativities;It is 2. homogeneity;3. triangle inequality.It is frequently used to measure some vector
The length or size of each vector in space (or matrix).Common norm has as follows:
1- norms:||x||1=| x1|+|x2|+|x3|+...+|xn| (10)
2- norms:||x||2=(| x1|2+|x2|2+|x3|2+...+|xn|2)1/2 (11)
∞ norms:||x||∞=max (| x1|,|x2|,......,|xn|) (12)
Norm in the finite dimensional space has good property, is mainly reflected in following theorem:
Property 1:For any group of base of finite dimension normed linear space, norm is the coordinate of element (under this group of base)
Continuous function.
Property 2 (Minkowski theorems):All norms of finite dimensional linear space are all of equal value.
Property 3 (Cauchy convergence principles):Finite dimensional linear space (pressing any norm) in real number field (or complex field)
It must be complete.
Property 4:Sequence in finite dimension normed linear space by the convergent necessary and sufficient condition of coordinate be it by any norm all
Convergence.
Attached drawing 5A is weight connection diagram between layers in traditional neural network.As shown in Figure 5A, each list
The neuron and next layer of neuron of position can be all attached.This kind of connection mode can cause model following defect:
1. the neuron that single neuron is connected is excessive, the parameter that model needs are trained is excessive, and calculation amount is excessive.
2. connection number is excessively also possible to the situation for causing to be more easily trapped into local optimum during training neural network.
For this case, the dropout layers of convolutional neural networks take the mode of sparse interaction, that is, random choosing
The connection selected between some neurons is deleted.Attached drawing 5B is in the prior art by neural network middle level after sparse interaction
Connection diagram between layer.By will be compared between Fig. 5 A and Fig. 5 B, it can be seen that the company between neural net layer
It connects number to greatly reduce, solves the defects of above-mentioned described connection is excessively caused, but the mode of sparse interaction is random right
Connection between layers is removed, and lacks the connection deletion algorithm with theoretical foundation.
Present embodiment passes through norm meter according to the property that norm is the function that can be vector imparting length relation
The length of vector is calculated, it, can be with by being zero by distance or being deleted less than the connection representated by the vector of some threshold value
Effectively to the quantity of reduction parameter, the calculation amount of model is reduced, the side deleted is connect to weight at random with sparse interaction
Method compares, and present embodiment is that purposefully weight connection is deleted, and can realize has choosing to what weight connected
The reservation of selecting property so that the deletion of weight connection becomes controllable and is provided for theoretical foundation.
The deletion that present embodiment is connected by the weight based on norm reaches the mesh for reducing weight connection redundancy
's;And the weight vectors orthogonalization that step is a kind of, then main purpose is to realize the reduction of weight join dependency, though the two has
Similarity is provided to the ability of enhancing neuroid study, but the two is to reach this in terms of two respectively
Point, the two are applied to the efficiency that neural network learning can be fully promoted in neural network simultaneously.
Step 3:Activation primitive is built, shown in the activation primitive such as following formula (13):
F (x)=sigmoid (x)+λ (13)
In formula, the calculating step of λ is as follows:The numerical value λ of an arbitrary size is set to λ first0, the value of counting loss function
y1;Then a variable quantity ε is increased to λ, continues the value y of counting loss function2If y2More than y1, then continue to increase the value of λ, and
The value of counting loss function until the value of current loss function is less than the value of a preceding loss function, then stops iteration, at this time λ
It is worth for local optimum;If y2Less than y1, then persistently reduce the value of λ, and the value of counting loss function, until current loss function
Value be less than the preceding loss function being once calculated value, then stop iteration, at this time λ value be local optimum;Office will be used as
The λ value of portion's optimal value substitutes into formula (13) and is calculated.
Neuron generates the rule of output signal by neuronal function function f (Activation under input signal effect
Function it) provides, also referred to as activation primitive or transfer function, this is the external characteristics of neuron models.Activation primitive contains
From input signal to net input, again to activation value, the final process for generating output signal.Due to combining net input, f function
Effect, activation primitive directly determine data mapping mode, at the same different activation primitives because the difference of gradient can also influence it is anti-
To propagation algorithm.F function is various informative, and the neural network of Various Functions is may be constructed using their different characteristics.
Neuronal function function common in the art has following several:
1. simple linear function
The continuous values of neuronal function function f, input x are weighted by connection matrix W and are generated output.
F (x)=x (14)
2. symmetrical hard-limiting function
A kind of nonlinear model of symmetrical hard-limiting function, output only take two-value.Such as+1 or -1 (or 1 and 0 be hard-limiting letter
Number), when net input is more than a certain threshold θ, output takes+1, conversely, output takes -1, this effect can be by sign function table
Show have at this time:
F (x)=sgn (x- θ) (15)
3.sigmoid functions (sigmoid function)
Attached drawing 6A, 6B are sigmoid function schematic diagrames.Neuron output be limited in it is continuous non-between two finite values
Subtraction function, expression formula can be written as:
Sigmoid curves are made of hyperbolic tangent function in formula (16), and maxima and minima takes+1 and -1 respectively.Such as figure
Shown in 6A.If hyperbolic functions translated up, that is, use a minor function i.e. unipolarity shape function instead.
Then curve maximum takes 1 and 0 respectively with minimum value, as shown in Figure 6B.
4. bipolarity activation primitive
The output area of general unipolarity S type activation primitives is 0~1, this is not best.In order to solve this problem may be used
Input range is become ± 0.5, while S types activation primitive is biased, the output area of node is made to become [- 0.5,0.5], then
Have
F (x)=- 1/2+1/ (1+e-x) (18)
In BP (Back Propagation) algorithm for using bipolarity activation primitive, convergence rate doubles a left side
It is right.
By the above-mentioned description to conventional activation function it can be seen that the diversity and importance of activation primitive and one
Influence of the good activation primitive for neural network learning efficiency, so it is always to promote nerve to find better activation primitive
The important point of penetration of network.
Adaptive activation primitive and conventional activation function shown in formula (13) provided in the present embodiment are not
It is with putting:The adaptive activation primitive of present embodiment mainly takes ReLu functions as basic structure, while at it
In plus uncertain parameter, the value of these parameters by constantly input data and changes to parameter mainly by data-driven
Generation, until parameter can well be fitted data.The parameter come is fitted to can be good at reflecting data characteristics.It is adaptive
The variable in activation primitive is answered to generate variation with learning process so that study is more flexible, can more embody the adaptive of system
Property.
Attached drawing 7 is the frame diagram of the adaptive depth convolution model of the network connection built in the specific embodiment of the invention.
The neural network that present embodiment is established is as shown in fig. 7, ultimately form following optimization aim:
min soft max+λ1*Ω1+λ2*Ω2 (19)
Wherein, softmax, Ω1, Ω2Formula be explained as follows:
Ω1=wi T.wj (21)
The value of softmax is by loss function loss (xy) and the sum of weight vectors and its amount of being connected to, Ω1To pass through power
The correlation of weight vector orthogonalization removal weights, Ω2For based on p norms weight connect delete, it is therefore intended that removal weights it
Between redundancy.Optimization aim can be rewritten as by formula (17), (18), (19):
The target to be optimized in the adaptive convolution model that formula (23) i.e. present embodiment is proposed, wherein
Loss (xy) is the optimization aim of conventional model, i.e. loss function.λ1、λ2It is weight vectors orthogonalization respectively, based on p norms
Weight connection delete shared parameter proportion, parameter setting is bigger, illustrates that corresponding optimization processing step constrains more apparent, wiFor
I-th of weight vectors of neural network,The sum of p norms of all weight vectors for same layer neural network.
Traditional neural network model is:Convolutional layer+line rectification layer+pond layer+full articulamentum.Similarly, this is specific
The adaptive depth convolution model of network connection that embodiment provides can be also based on upper strata, but in the core calculations of some layers
In function and weight computations, the difference of present embodiment and traditional neural network model can be provided.
Weight vectors orthogonalization in present embodiment step 1 is connected with the weight based on p norms in step 2
Deletion is all that the calculating of weight vectors is optimized, and the calculating of weight vectors is happened at full articulamentum, therefore in full articulamentum
Weight vectors calculating process in, add in the constraint added of present embodiment, the weight vectors being calculated can be more
Add the feature for embodying data.
The adaptive activation primitive of present embodiment structure, activation primitive are located at line rectification layer, therefore, utilize this
The adaptive activation primitive that specific embodiment proposes replaces traditional activation primitive, can reach better modelling effect.
Present embodiment realizes the optimization of full articulamentum first, that is, overcomes the randomness of dropout beta prunings, introduces
The concept of " p norms adaptive model " weighs the significance level of connection to realize intelligent beta pruning with weight distance, increases data
Validity and improve model accuracy;Secondly, in the adaptive design of activation primitive, current proposed is made a general survey of
Activation primitive is all to manually adjust the design that the method for parameter carries out activation primitive, and present embodiment is automatic with model
Activation primitive form is selected as break-through point, to find optimal activation primitive form of each stage;Finally, it in data processing, reduces
In data volume and raising data validity, present embodiment proposes, using the orthogonal method for judging correlation of convolution kernel, to lead to
Cross the accuracy and the efficiency of model for efficiently extracting data to ensure last model.
The above is only the preferred embodiment of the present invention, it is noted that for the ordinary skill people of the art
Member, various improvements and modifications may be made without departing from the principle of the present invention, these improvements and modifications also should be regarded as
Protection scope of the present invention.
Claims (1)
1. a kind of construction method of the adaptive depth convolution model of network connection, which is characterized in that include the following steps:
Step 1:Weight vectors in convolutional neural networks are orthogonalized, specific calculation formula such as following formula (1), (2) institute
Show:
wi=vi (1)
In formula, vi、vjRespectively i-th of weight vectors of same layer neural network, j-th of weight vectors, wiIt is viWith vjIt is orthogonal
Corresponding weight vectors, w after changejIt is vjWith viCorresponding weight vectors after orthogonalization;
Step 2:According to the deletion that norm connected between layers in convolutional neural networks, specific calculation formula is as follows
Shown in formula (3):
In formula, wiAnd wjIt is i-th of the weight vector and j-th of weight vector for belonging to one layer of neural network,It is i-th
The p norm values of weight vector, will be other of p norm values and this layer of the weight vector after the update of each weight vector
Anticipate weight vector wjP norm values be added, if that they are added and for 0, by wiWeight vector becomes null vector;
Step 3:Activation primitive is built, shown in the activation primitive such as following formula (4):
F (x)=sigmoid (x)+λ (4)
In formula, the calculating step of λ is as follows:The numerical value λ of an arbitrary size is set to λ first0, the value y of counting loss function1;So
A variable quantity ε is increased to λ afterwards, continues the value y of counting loss function2If y2More than y1, then continue to increase the value of λ, and calculate
The value of loss function until the value of current loss function is less than the value of a preceding loss function, then stops iteration, λ value is at this time
Local optimum;If y2Less than y1, then persistently reduce the value of λ, and the value of counting loss function, until the value of current loss function
Less than the value of the preceding loss function being once calculated, then stop iteration, λ value is local optimum at this time;Part will be used as most
The λ value of the figure of merit substitutes into formula (4) and is calculated.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201711268262.6A CN108171319A (en) | 2017-12-05 | 2017-12-05 | The construction method of the adaptive depth convolution model of network connection |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201711268262.6A CN108171319A (en) | 2017-12-05 | 2017-12-05 | The construction method of the adaptive depth convolution model of network connection |
Publications (1)
Publication Number | Publication Date |
---|---|
CN108171319A true CN108171319A (en) | 2018-06-15 |
Family
ID=62524371
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201711268262.6A Pending CN108171319A (en) | 2017-12-05 | 2017-12-05 | The construction method of the adaptive depth convolution model of network connection |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108171319A (en) |
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109002890A (en) * | 2018-07-11 | 2018-12-14 | 北京航空航天大学 | The modeling method and device of convolutional neural networks model |
CN109214353A (en) * | 2018-09-27 | 2019-01-15 | 云南大学 | A kind of facial image based on beta pruning model quickly detects training method and device |
CN109344888A (en) * | 2018-09-19 | 2019-02-15 | 广东工业大学 | A kind of image-recognizing method based on convolutional neural networks, device and equipment |
CN110736707A (en) * | 2019-09-16 | 2020-01-31 | 浙江大学 | Spectrum detection optimization method for spectrum model transfer from master instruments to slave instruments |
CN110874574A (en) * | 2019-10-30 | 2020-03-10 | 平安科技(深圳)有限公司 | Pedestrian re-identification method and device, computer equipment and readable storage medium |
CN112598640A (en) * | 2020-12-22 | 2021-04-02 | 哈尔滨市科佳通用机电股份有限公司 | Water filling port cover plate loss detection method based on deep learning |
TWI740726B (en) * | 2020-07-31 | 2021-09-21 | 大陸商星宸科技股份有限公司 | Sorting method, operation method and apparatus of convolutional neural network |
US11366978B2 (en) | 2018-10-23 | 2022-06-21 | Samsung Electronics Co., Ltd. | Data recognition apparatus and method, and training apparatus and method |
-
2017
- 2017-12-05 CN CN201711268262.6A patent/CN108171319A/en active Pending
Cited By (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109002890A (en) * | 2018-07-11 | 2018-12-14 | 北京航空航天大学 | The modeling method and device of convolutional neural networks model |
CN109344888A (en) * | 2018-09-19 | 2019-02-15 | 广东工业大学 | A kind of image-recognizing method based on convolutional neural networks, device and equipment |
CN109214353A (en) * | 2018-09-27 | 2019-01-15 | 云南大学 | A kind of facial image based on beta pruning model quickly detects training method and device |
CN109214353B (en) * | 2018-09-27 | 2021-11-23 | 云南大学 | Training method and device for rapid detection of face image based on pruning model |
US11366978B2 (en) | 2018-10-23 | 2022-06-21 | Samsung Electronics Co., Ltd. | Data recognition apparatus and method, and training apparatus and method |
CN110736707A (en) * | 2019-09-16 | 2020-01-31 | 浙江大学 | Spectrum detection optimization method for spectrum model transfer from master instruments to slave instruments |
CN110874574A (en) * | 2019-10-30 | 2020-03-10 | 平安科技(深圳)有限公司 | Pedestrian re-identification method and device, computer equipment and readable storage medium |
WO2021082078A1 (en) * | 2019-10-30 | 2021-05-06 | 平安科技(深圳)有限公司 | Pedestrian re-recognition method and apparatus, computer device, and readable storage medium |
TWI740726B (en) * | 2020-07-31 | 2021-09-21 | 大陸商星宸科技股份有限公司 | Sorting method, operation method and apparatus of convolutional neural network |
CN112598640A (en) * | 2020-12-22 | 2021-04-02 | 哈尔滨市科佳通用机电股份有限公司 | Water filling port cover plate loss detection method based on deep learning |
CN112598640B (en) * | 2020-12-22 | 2021-09-14 | 哈尔滨市科佳通用机电股份有限公司 | Water filling port cover plate loss detection method based on deep learning |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108171319A (en) | The construction method of the adaptive depth convolution model of network connection | |
CN109214566B (en) | Wind power short-term prediction method based on long and short-term memory network | |
CN107688850B (en) | Deep neural network compression method | |
CN104751842B (en) | The optimization method and system of deep neural network | |
CN107688849A (en) | A kind of dynamic strategy fixed point training method and device | |
Alaloul et al. | Data processing using artificial neural networks | |
CN108549658B (en) | Deep learning video question-answering method and system based on attention mechanism on syntax analysis tree | |
CN107679618A (en) | A kind of static policies fixed point training method and device | |
CN107301864A (en) | A kind of two-way LSTM acoustic models of depth based on Maxout neurons | |
CN110321361B (en) | Test question recommendation and judgment method based on improved LSTM neural network model | |
CN107679617A (en) | The deep neural network compression method of successive ignition | |
CN109829541A (en) | Deep neural network incremental training method and system based on learning automaton | |
CN108427665A (en) | A kind of text automatic generation method based on LSTM type RNN models | |
CN110222901A (en) | A kind of electric load prediction technique of the Bi-LSTM based on deep learning | |
CN104636985A (en) | Method for predicting radio disturbance of electric transmission line by using improved BP (back propagation) neural network | |
CN106970981B (en) | Method for constructing relation extraction model based on transfer matrix | |
CN108596327A (en) | A kind of seismic velocity spectrum artificial intelligence pick-up method based on deep learning | |
CN113392210A (en) | Text classification method and device, electronic equipment and storage medium | |
CN110110372B (en) | Automatic segmentation prediction method for user time sequence behavior | |
Shi et al. | The prediction of character based on recurrent neural network language model | |
CN110222844A (en) | A kind of compressor performance prediction technique based on artificial neural network | |
CN111382840B (en) | HTM design method based on cyclic learning unit and oriented to natural language processing | |
CN105550748A (en) | Method for constructing novel neural network based on hyperbolic tangent function | |
CN113157919A (en) | Sentence text aspect level emotion classification method and system | |
CN109670169B (en) | Deep learning emotion classification method based on feature extraction |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20180615 |
|
RJ01 | Rejection of invention patent application after publication |