CN114612750B - Target identification method and device for adaptive learning rate collaborative optimization and electronic equipment - Google Patents

Target identification method and device for adaptive learning rate collaborative optimization and electronic equipment Download PDF

Info

Publication number
CN114612750B
CN114612750B CN202210499300.3A CN202210499300A CN114612750B CN 114612750 B CN114612750 B CN 114612750B CN 202210499300 A CN202210499300 A CN 202210499300A CN 114612750 B CN114612750 B CN 114612750B
Authority
CN
China
Prior art keywords
model
fusion
target
generation
training
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202210499300.3A
Other languages
Chinese (zh)
Other versions
CN114612750A (en
Inventor
张宝昌
王滨
王润琪
吕金虎
王田
王星
张峰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangzhou Hikvision Digital Technology Co Ltd
Original Assignee
Hangzhou Hikvision Digital Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangzhou Hikvision Digital Technology Co Ltd filed Critical Hangzhou Hikvision Digital Technology Co Ltd
Priority to CN202210499300.3A priority Critical patent/CN114612750B/en
Publication of CN114612750A publication Critical patent/CN114612750A/en
Application granted granted Critical
Publication of CN114612750B publication Critical patent/CN114612750B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/047Probabilistic or stochastic networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Abstract

The embodiment of the application provides a target identification method and device for adaptive learning rate collaborative optimization and electronic equipment. In this embodiment, the model trained in each training batch is adaptively fused with the previously trained model according to the difference between the model trained in each training batch and the previously trained model, so that forgetting of the model on previous knowledge can be reduced, loss oscillation in the neural network training process is avoided, and smooth convergence of the model is accelerated.

Description

Target identification method and device for adaptive learning rate collaborative optimization and electronic equipment
Technical Field
The present application relates to the field of artificial intelligence, and in particular, to a target identification method and apparatus for adaptive learning rate collaborative optimization, and an electronic device.
Background
Deep learning has demonstrated excellent performance in various different areas of image, speech, natural language processing, and the like. Taking face recognition as an example, the recognition rate and recognition speed of deep learning already exceed the speed of manual recognition. In addition, word recognition, text translation, and automated driving, which is now heavily studied, all benefit from the development of deep learning.
At present, although a reverse feedback (BP) algorithm with a good effect is adopted to calculate a gradient in a neural network model training process based on deep learning, oscillation loss still exists, such as slow training speed and incapability of converging the model. The loss shock can affect the training of the neural network model, and further affect the application of the neural network model, such as inaccuracy of recognized target objects (such as faces) and/or abnormal action behaviors (such as fire, robbery, fighting and stealing).
Disclosure of Invention
The embodiment of the application provides a target identification method and device for adaptive collaborative optimization and electronic equipment, so that loss oscillation in the training process of a neural network model is avoided by adaptively fusing a trained historical model, and the accuracy of target identification is improved.
The embodiment of the application provides a target identification method for adaptive collaborative optimization, which comprises the following steps:
under the condition that the model is trained to t generation, obtaining a target model obtained by training in t-1 generation
Figure 537459DEST_PATH_IMAGE001
(ii) a The object model
Figure 924578DEST_PATH_IMAGE002
Is a sub-model generated based on training data in each training batch in the t-1 generation
Figure 481330DEST_PATH_IMAGE003
And a reference model
Figure 903084DEST_PATH_IMAGE004
Determining a fusion model obtained by fusion; when the value t is greater than 2, the value,
Figure 903401DEST_PATH_IMAGE005
is a target model obtained by training in the t-2 generation;
based on a target model
Figure 777816DEST_PATH_IMAGE002
And obtaining the sub-model generated based on the training data in each training batch when t generation is obtained
Figure 154571DEST_PATH_IMAGE006
(ii) a Wherein k represents a training batch;
according to sub-model
Figure 634093DEST_PATH_IMAGE007
With the object model
Figure 867629DEST_PATH_IMAGE002
Distance between them determining submodel
Figure 416291DEST_PATH_IMAGE006
With the object model
Figure 862315DEST_PATH_IMAGE002
Fusing weights of the fused target models, and combining the sub-models according to the target model fusion weights
Figure 258662DEST_PATH_IMAGE006
With the object model
Figure 600781DEST_PATH_IMAGE002
Fusing to obtain a fusion model;
combining the fusion model with the target model
Figure 184209DEST_PATH_IMAGE002
Combining to obtain a target model trained in the t generation
Figure 699504DEST_PATH_IMAGE008
To pass the trained target model
Figure 684778DEST_PATH_IMAGE009
And carrying out target identification.
The embodiment of the application provides a target identification device for adaptive collaborative optimization, which comprises:
an obtaining unit, configured to obtain the target model obtained through training in the t-1 generation when the model is trained to the t generation
Figure 197799DEST_PATH_IMAGE001
(ii) a The object model
Figure 268523DEST_PATH_IMAGE010
Is a sub-model generated based on training data in each training batch in the t-1 generation
Figure 790771DEST_PATH_IMAGE011
And a reference model
Figure 896131DEST_PATH_IMAGE004
Determining a fusion model obtained by fusion; when t is greater than 2, the process is repeated,
Figure 94900DEST_PATH_IMAGE005
is a target model obtained by training in the t-2 generation; and, based on the object model
Figure 590603DEST_PATH_IMAGE010
And obtaining the sub-model generated based on the training data in each training batch when t generation is obtained
Figure 713280DEST_PATH_IMAGE007
(ii) a Wherein k represents a training batch;
a fusion unit for being dependent on the sub-model
Figure 407566DEST_PATH_IMAGE012
With the object model
Figure 262390DEST_PATH_IMAGE010
Distance between them determining submodel
Figure 573285DEST_PATH_IMAGE013
With the object model
Figure 968495DEST_PATH_IMAGE010
Fusing weights of fused target models, and sub-models are fused according to the target model fusion weights
Figure 720550DEST_PATH_IMAGE012
With the object model
Figure 74171DEST_PATH_IMAGE010
Fusing to obtain a fusion model;
a determination unit for combining the fusion model with the target model
Figure 341204DEST_PATH_IMAGE010
Combining to obtain a target model trained in the t generation
Figure 727055DEST_PATH_IMAGE009
To pass the trained target model
Figure 661513DEST_PATH_IMAGE009
And carrying out target identification.
The embodiment of the application also provides the electronic equipment. The electronic device includes: a processor and a machine-readable storage medium;
the machine-readable storage medium stores machine-executable instructions executable by the processor;
the processor is configured to execute machine-executable instructions to implement the steps of the above-disclosed method.
According to the technical scheme, in the embodiment of the application, in the model training process, the sub-model trained in each training batch is adaptively fused with the target model trained in the previous generation according to the difference between the sub-model trained in each training batch and the target model trained in the previous generation, and the target model trained to the current generation is determined according to the finally fused model, so that the target model trained in any generation can inherit the information of the target model trained in the previous generation and the sub-model trained in each training batch, the forgetting of the previous knowledge by the target model trained at present is reduced, the loss oscillation in the neural network model training process is avoided, and the accuracy of target identification is improved;
further, in the embodiment, by adaptively fusing the model trained in each training batch with the target model trained in the previous generation, after iteration is performed for a sufficient number of times, it is finally found that the difference between the sub-model trained in the training batch and the target model trained in the previous generation is very small and basically tends to be consistent, so that the problem of extreme learning rate is avoided, and the generalization performance of the method provided by the embodiment of the present application is ensured.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present disclosure and together with the description, serve to explain the principles of the disclosure.
FIG. 1 is a flow chart of a method provided by an embodiment of the present application;
FIG. 2 is a block diagram of an apparatus according to an embodiment of the present disclosure;
fig. 3 is a structural diagram of an electronic device according to an embodiment of the present application.
Detailed Description
Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The embodiments described in the following exemplary embodiments do not represent all embodiments consistent with the present application. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the present application, as detailed in the appended claims.
The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the application. As used in this application and the appended claims, the singular forms "a", "an", and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise.
In order to make the technical solutions provided in the embodiments of the present application better understood and make the above objects, features and advantages of the embodiments of the present application more comprehensible, the technical solutions in the embodiments of the present application are described in further detail below with reference to the accompanying drawings.
In the conventional model training at present, no matter which generation the model is trained to, the model is directly updated on the newly trained model at present. For example, when the model is trained to t generation, the model trained in t-1 generation (i.e. the target model trained in t-1 generation) is directly updated, and when the model is trained to t +1 generation, the model trained in t generation (i.e. the target model trained in t generation) is directly updated, and so on.
However, it has been found that such direct updating of the currently newly trained model during the model training iteration is ideal. In the training process of the current model, due to the inverted shooting inhibition, catastrophic forgetfulness to a previously trained model, such as a model characteristic parameter, often occurs, so that a later trained model and a previously trained model compete with each other, and the previous learning is interfered.
In order to solve the above technical problems, this embodiment breaks through the conventional model training manner, and adaptively fuses the model trained in each training batch with the previously trained model according to the difference between the model trained in each training batch and the previously trained model during model training, so as to avoid the above feature parameter forgetting problem, accelerate model convergence, avoid loss oscillation during neural network model training, and further improve the accuracy of target identification.
The method provided by the embodiments of the present application is described below:
referring to fig. 1, fig. 1 is a flowchart of a method provided in an embodiment of the present application. The process is applied to the electronic equipment. Optionally, as an embodiment, the electronic device may be an internet of things terminal device such as a video terminal and a door access device, which is applied to the neural network model. As another embodiment, the electronic device may also be a backend device such as a server, and the embodiment is not limited in particular.
As shown in fig. 1, the process may include the following steps:
and 101, under the condition that the model is trained to t generation, obtaining a target model obtained by training in t-1 generation.
For convenience of description, in this embodiment, the target model obtained by training in the t-1 generation is denoted as
Figure 123718DEST_PATH_IMAGE014
In the present embodiment, the object model
Figure 878048DEST_PATH_IMAGE010
Is a sub-model generated based on training data in each training batch in the t-1 generation
Figure 83901DEST_PATH_IMAGE015
And a reference model
Figure 76128DEST_PATH_IMAGE005
And determining a fusion model obtained by fusion. When the value t is greater than 2, the value,
Figure 505972DEST_PATH_IMAGE004
the target model obtained by training in the t-2 generation. Of course, when t =2,
Figure 200128DEST_PATH_IMAGE005
is the initial model at the beginning of the entire model training. In this embodiment, the initial model is not particularly limited, and may be, for example, a fully-connected network, a convolutional neural network, a cyclic neural network, or the like. As for the network structure, the embodiment is not limited, for example, taking a fully connected network as an example, the model may include: the device comprises an input layer, a first hidden layer, a second hidden layer, a third hidden layer, a fourth hidden layer, a fifth hidden layer and an output layer.
As for the object model
Figure 740830DEST_PATH_IMAGE002
The training mode is similar to how the target model of t generation is trained (noted as
Figure 384301DEST_PATH_IMAGE009
). For the moment, this description will not be repeated.
And 102, acquiring a sub-model generated based on training data in each training batch when the generation t is based on the target model obtained by training the generation t-1.
In application, training data used for model training is relatively large, and a whole set of training data is often divided into a plurality of batches for training. For example, there are 50000 training data, and 50000 training data can be divided into 10 training batches, each training batch containing 5000 training data.
It should be noted that, this embodiment does not specifically limit the training data in each training batch, and may determine the training data according to a target to be recognized, for example, the recognition target is a human face, a verification code reading, and the like, and may use feature vector data obtained by processing corresponding picture data as the training data, and the like, which is not specifically limited in this embodiment.
In this embodiment, for the first training batch, the training data of the first training batch is used to directly train the target model obtained in the t-1 generation
Figure 188309DEST_PATH_IMAGE002
Performing model updating to obtain
Figure 651652DEST_PATH_IMAGE016
For the second training batch, directly pairing the training data in the second training batch
Figure 261625DEST_PATH_IMAGE016
Updating the model to obtain
Figure 962864DEST_PATH_IMAGE017
Repeating the steps until the last training batch n is obtained, and obtaining n-1 training batches by using the training data in the nth training batch
Figure 734511DEST_PATH_IMAGE018
Updating the model to obtain
Figure 153991DEST_PATH_IMAGE019
. Finally, a sub-model generated based on training data in each training batch when t generation is obtainedModel (III)
Figure 302076DEST_PATH_IMAGE020
And k represents a training batch.
Figure 196888DEST_PATH_IMAGE007
Represented by the formula:
Figure 77119DEST_PATH_IMAGE021
}。
and 103, determining a target model fusion weight for fusion of the sub-model and the target model obtained by training in the t-1 generation according to the distance between the sub-model generated based on the training data in each training batch and the target model obtained by training in the t-1 generation during the t generation, and fusing the sub-model and the target model obtained by training in the t-1 generation according to the target model fusion weight to obtain a fusion model.
Optionally, in this embodiment, the sub-model generated based on the training data in each training batch at the time of generation t can be used
Figure 780633DEST_PATH_IMAGE006
And
Figure 466829DEST_PATH_IMAGE002
the Manhattan distance between the two models is used as an adaptive weight parameter for weighted fusion of the models so as to realize that the finally obtained fusion model completes the deduction from an individual to the whole.
Based on the above description, follow the sub-model
Figure 877082DEST_PATH_IMAGE020
With the object model
Figure 990531DEST_PATH_IMAGE002
The manhattan distance between them, a candidate model fusion weight is generated as represented by:
Figure 181341DEST_PATH_IMAGE022
wherein the content of the first and second substances,
Figure 608912DEST_PATH_IMAGE023
represent
Figure 654097DEST_PATH_IMAGE024
And
Figure 204027DEST_PATH_IMAGE025
the manhattan distance between the two electrodes,
Figure 554237DEST_PATH_IMAGE026
represents
Figure 51077DEST_PATH_IMAGE024
And
Figure 498239DEST_PATH_IMAGE027
manhattan distance between them, M represents the candidate model fusion weight.
After the candidate model fusion weight is obtained, as one embodiment, the candidate model fusion weight may be taken as the target model fusion weight. As another embodiment, in order to ensure that the sum of the weight parameters in each row is close to a set value, such as 1, each row in the candidate model fusion weights may be transformed to obtain the target model fusion weight. Wherein the transition is used to control the sum of the weight parameters in each row to approach a set value. In a specific implementation, for example, the conversion may be implemented by a normalized exponential function (softmax), and the like, and the embodiment is not particularly limited.
Corresponding to the candidate model fusion weight, the final target model fusion weight in this embodiment can be represented by the following formula:
Figure 953491DEST_PATH_IMAGE028
wherein the content of the first and second substances,
Figure 56576DEST_PATH_IMAGE029
is through pair
Figure 622687DEST_PATH_IMAGE030
The light-emitting diode is obtained by conversion,
Figure 924355DEST_PATH_IMAGE031
is through pair
Figure 3039DEST_PATH_IMAGE026
The light-emitting diode is obtained by conversion,
Figure 390158DEST_PATH_IMAGE032
representing the target model fusion weights.
After the target model fusion weight is obtained, as described in step 103, the sub-models can be combined according to the target model fusion weight
Figure 759959DEST_PATH_IMAGE006
With the object model
Figure 119396DEST_PATH_IMAGE002
And fusing to obtain a fusion model.
As one embodiment, sub-models are fused with weights according to the target model
Figure 182030DEST_PATH_IMAGE020
And object model
Figure 790866DEST_PATH_IMAGE001
When merging, one row of elements may be selected from the target model fusion weights first, and based on the target model fusion weights described above, the last row of elements in the target model fusion weights may be selected (because the elements embody the target model trained with t-1 generation)
Figure 367953DEST_PATH_IMAGE001
Manhattan distance between). Thereafter, the selected elements are used as sub-models
Figure 96744DEST_PATH_IMAGE020
With the object model
Figure 533541DEST_PATH_IMAGE001
The fused weight, based on the object model fusion weight described above, may be, for example, the first element of the last line of the selection as a sub-model
Figure 98515DEST_PATH_IMAGE016
And object model
Figure 75698DEST_PATH_IMAGE002
The fused weight can use the second element of the last line of the selection as a sub-model
Figure 409728DEST_PATH_IMAGE017
And object model
Figure 548585DEST_PATH_IMAGE002
The fused weight is analogized in turn. Then, based on the sub-model
Figure 132013DEST_PATH_IMAGE006
With the object model
Figure 99838DEST_PATH_IMAGE002
Fused weights, sub-models
Figure 85111DEST_PATH_IMAGE020
With the object model
Figure 660449DEST_PATH_IMAGE001
And fusing to obtain a fused model.
In this embodiment, if the last row of elements in the fusion weight of the target model is selected, the fusion model can be represented by the following formula:
Figure 668856DEST_PATH_IMAGE033
wherein the content of the first and second substances,
Figure 722263DEST_PATH_IMAGE034
representing the fusion model, k represents the training batch, when k = n +1,
Figure 827622DEST_PATH_IMAGE035
=
Figure 777124DEST_PATH_IMAGE001
it should be noted that, in this embodiment, the last row of elements in the target model fusion weight is selected because all the last row of elements exhibit the target model trained in the t-1 generation
Figure 538406DEST_PATH_IMAGE002
The Manhattan distance between the two sub-models ensures the sub-model trained currently
Figure 395504DEST_PATH_IMAGE007
(i.e., the task-specific model) can absorb as much of the base model as possible
Figure 807900DEST_PATH_IMAGE001
Knowledge of (a). In the present embodiment of the present invention,
Figure 725040DEST_PATH_IMAGE036
the method can be used as an adaptive parameter in a model learning process, and helps the model to complete optimization more robustly.
And 104, combining the fusion model with the target model obtained by training in the t-1 generation to obtain a target model trained in the t generation, and performing target identification through the target model trained in the t generation.
In this embodiment, the target model trained in the t generation can be written as
Figure 708040DEST_PATH_IMAGE009
In the embodiment, the use of the fusion model is introduced when the fusion model is combined with the target model obtained by training in the t-1 generationOver-parameters in controlling learning speed
Figure 368828DEST_PATH_IMAGE037
. For example, the fusion model is combined with the target model obtained by training in the t-1 generation according to the following formula:
Figure 386463DEST_PATH_IMAGE038
here, the number of the first and second electrodes,
Figure 208925DEST_PATH_IMAGE037
it may be preset, such as to 0.1. The present embodiment is not particularly limited.
Due to the above candidate fusion weight
Figure 741538DEST_PATH_IMAGE039
Or the object model fusion weights regularization, such that after a sufficient number of iterations, the task-specific model type
Figure 940438DEST_PATH_IMAGE040
And a basic model
Figure 812579DEST_PATH_IMAGE041
The difference between them is gradually reduced, and the task is specific to the model type
Figure 71522DEST_PATH_IMAGE040
And basic model
Figure 278381DEST_PATH_IMAGE041
The weights of (a) and (b) tend to be consistent. When all models are task-specific models
Figure 280972DEST_PATH_IMAGE040
And a basic model
Figure 69937DEST_PATH_IMAGE041
Ideally optimized to the same model, which share the same knowledge, thereby eliminating task specific modelsInformation is lost in type fusion. By the model fusion method, the forgetting of the model to the previous knowledge can be reduced, so that loss oscillation in the neural network training process is avoided, and the smooth convergence of the model is accelerated.
In addition, the target recognition mentioned in step 104 may be used for target recognition of pictures or videos. For example, for face recognition, step 104 may be implemented by training the target model
Figure 437464DEST_PATH_IMAGE009
Carrying out face recognition on the face picture; as another example, when applied to the recognition of objects, such as vehicles, the step 104 may be implemented by training the object model
Figure 679090DEST_PATH_IMAGE009
Performing target correspondence such as vehicle recognition on a target object picture such as a vehicle picture; for example, the method can be applied to abnormal behavior recognition (such as robbery, fire, fighting, etc.), and the step 104 can be implemented by trained target models
Figure 485372DEST_PATH_IMAGE009
The video image is subjected to abnormal behavior identification, and this embodiment is not particularly limited.
In this embodiment, the trained object model is used as the target model
Figure 863263DEST_PATH_IMAGE042
Also updated on demand, e.g. in the target model
Figure 667271DEST_PATH_IMAGE043
When the number of the false alarms reaches a preset threshold value, automatically updating; as another example, manual updates and the like are performed based on external triggers such as maintenance personnel and the like. The present embodiment is not particularly limited. As for the updating manner, the flow shown in fig. 1 can be returned.
The flow shown in fig. 1 is thus completed.
As can be seen from the flow shown in fig. 1, in the present embodiment, in the model training process, the model trained in each training batch is adaptively fused with the previously trained model according to the difference between the model trained in each training batch and the previously trained model, so that the forgetting of the model to the previous knowledge can be reduced, thereby avoiding the loss oscillation in the neural network training process and accelerating the smooth convergence of the model.
The sub-model generated for the training data in each training batch is as follows
Figure 396193DEST_PATH_IMAGE044
The description is that:
in the present embodiment, the sub-models are generated based on the training data in each training batch
Figure 740587DEST_PATH_IMAGE020
Is evaluated by the constructed loss function. Here, the loss function will be described as being based on a sub-model
Figure 176247DEST_PATH_IMAGE020
With the object model
Figure 213473DEST_PATH_IMAGE001
The regularization function of the Manhattan distance and the Hamilton energy loss function are two parts; the Hamiltonian energy loss function is used to help the trained model approach the theoretically optimal model from an energy perspective.
Alternatively, in this embodiment, the loss function can be represented by the following formula:
Figure 616642DEST_PATH_IMAGE045
wherein, the first and the second end of the pipe are connected with each other,
Figure 499147DEST_PATH_IMAGE046
representing the hamiltonian energy loss function,
Figure 117210DEST_PATH_IMAGE047
for assisting in predicting values fromThe energy angle approaches the theoretical optimal label value;
Figure 263021DEST_PATH_IMAGE048
to represent
Figure 700955DEST_PATH_IMAGE049
And
Figure 652731DEST_PATH_IMAGE050
the Manhattan distance between the two, a and b are [1, n +1 ]],
Figure 62983DEST_PATH_IMAGE051
=
Figure 176433DEST_PATH_IMAGE052
The following describes a hamiltonian energy loss function related to an embodiment of the present application:
in model training, the following problems often occur: on one hand, training data in different training batches have different distributions, and the different distributions bring difficulty to the training of the whole network; on the other hand, after the training data in the training batch is calculated by a layer-by-layer network, the data distribution of the training data also changes (this phenomenon is called data distribution drift). In order to solve the above problem, the model trained in this embodiment at least includes a batch normalization (batch norm) layer.
The batch normalization layer performs corresponding operations through the following formula:
Figure 367243DEST_PATH_IMAGE053
wherein the content of the first and second substances,
Figure 591551DEST_PATH_IMAGE054
indicates the batch size (batch size),
Figure 121889DEST_PATH_IMAGE055
in order to be an input, the user can select,
Figure 406240DEST_PATH_IMAGE056
is the average value of the batches,
Figure 818767DEST_PATH_IMAGE057
is the variance of the batch,
Figure 311014DEST_PATH_IMAGE058
is a minimum value to prevent points from being scored
Figure 758176DEST_PATH_IMAGE059
It is not meaningful to have the information that,
Figure 213428DEST_PATH_IMAGE060
and
Figure 50934DEST_PATH_IMAGE061
as a trainable parameter of the batch specification layer,
Figure 882624DEST_PATH_IMAGE062
is the output.
In the model, operations also need to be performed on the output of the batch specification layer, model parameters, and the like. Taking the convolution operation performed on the output of the batch specification layer and the model parameter as an example, the convolution operation performed on the output of the batch specification layer and the model parameter is expressed as follows:
Figure 184292DEST_PATH_IMAGE063
. Wherein the content of the first and second substances,
Figure 13708DEST_PATH_IMAGE064
the parameters of the model are represented by,
Figure 400827DEST_PATH_IMAGE065
representing the output of the batch specification layer. In application, operations are performed on the output of the batch specification layer and model parameters, etc., which constitute parameters in the physical structure
Figure 770629DEST_PATH_IMAGE066
The parameter coupling problem exists in the model training process.
For the parameter coupling problem existing in the model training process, if the same gradient is reduced on the parameter, potential energy loss is caused. In order to ensure the minimum energy loss in the model training process and maintain the overall performance of the model, a hamilton energy loss function is introduced in the embodiment. Coupling relations or cooperation of different parameters are considered through a Hamilton energy loss function, the gradient is ensured to descend along the terrain constrained by the different parameters on the premise of following the minimum energy loss principle, and the coupling relations among the different parameters are fully considered.
Alternatively, in the present embodiment, the hamiltonian energy loss function can be represented by the following formula:
Figure 130066DEST_PATH_IMAGE067
wherein the content of the first and second substances,
Figure 192700DEST_PATH_IMAGE068
representing the hamiltonian energy loss function,
Figure 254066DEST_PATH_IMAGE069
is a variable of the first order and is,
Figure 427558DEST_PATH_IMAGE064
the parameters of the model are represented by,
Figure 438239DEST_PATH_IMAGE070
representing output values with respect to a model
Figure 609458DEST_PATH_IMAGE065
And model parameters
Figure 971169DEST_PATH_IMAGE064
A co-optimized function.
As an example, in particular implementations,
Figure 682773DEST_PATH_IMAGE071
can be proportional controlSystem-integral-control-differential-control (PID) equation representation, e.g.
Figure 751223DEST_PATH_IMAGE071
Comprises the following steps:
Figure 155660DEST_PATH_IMAGE072
where lr represents an initial learning rate,
Figure 739088DEST_PATH_IMAGE073
representing the loss gradient of the model training to the t generation,
Figure 457645DEST_PATH_IMAGE074
representing a scaling parameter.
Figure 177339DEST_PATH_IMAGE075
Representing the output value of the model during the training of the model in the ith training batch
Figure 752677DEST_PATH_IMAGE065
With corresponding tag value
Figure 275931DEST_PATH_IMAGE076
The cross entropy of (2) is used for measuring the accuracy of model training.
Optionally, in the present embodiment, can be directed to
Figure 329338DEST_PATH_IMAGE077
The following settings were made:
Figure 434697DEST_PATH_IMAGE078
to make
Figure 118619DEST_PATH_IMAGE077
And as an auxiliary variable of Legendre change, converting a cross entropy equation into a coordinate system, decoupling coupled parameters, and reducing the energy loss in the model training process to the minimum to ensure the overall performance of the model.
As an example, when addressing
Figure 676640DEST_PATH_IMAGE079
When the above-described setting is performed, correspondingly, in the present embodiment,
Figure 533737DEST_PATH_IMAGE080
Figure 431286DEST_PATH_IMAGE060
the update rules of (1) are respectively:
Figure 348426DEST_PATH_IMAGE081
Figure 393743DEST_PATH_IMAGE082
wherein the content of the first and second substances,
Figure 992214DEST_PATH_IMAGE083
the representation is based on
Figure 806587DEST_PATH_IMAGE084
Determining
Figure 894628DEST_PATH_IMAGE085
A learning rate of time, said
Figure 161662DEST_PATH_IMAGE085
Representing the model parameters when the model was trained to the t +1 generation,
Figure 813092DEST_PATH_IMAGE084
representing model parameters when the model is trained to t generation;
Figure 747550DEST_PATH_IMAGE086
the representation is based on
Figure 740914DEST_PATH_IMAGE087
Determining
Figure 698505DEST_PATH_IMAGE088
A learning rate of time, said
Figure 701096DEST_PATH_IMAGE088
Representing the scaling parameters when the model was trained to the t +1 generation,
Figure 224482DEST_PATH_IMAGE087
representing the scaling parameters when the model was trained to the t generation,
Figure 857588DEST_PATH_IMAGE089
representing the hamiltonian energy loss function when the model is trained to the t generations.
Finally, with the loss function, when model training is performed based on the training data in each training batch, if the loss function calculated based on the output value (also referred to as predicted value) and the label value (also referred to as true value) of the current model is found to be the minimum, the model training of the training batch is ended, and the model at that time is marked as the sub-model generated by the training batch
Figure 364793DEST_PATH_IMAGE020
In the early stage of training, the output value of the loss function is changed greatly, namely the loss gradient is large, and correspondingly the learning step length is large, so that the model can be allowed to change in a large range, and the optimization of the weight parameters is accelerated. When training tends to be stable, the output value change of the loss function becomes small, namely the loss gradient is small, the learning step length tends to be fixed, the problem of extreme learning rate is avoided, the generalization performance of the optimizer algorithm is ensured, and the advantages of the optimizer algorithm are exerted.
Further, in the present embodiment, by
Figure 639916DEST_PATH_IMAGE090
As an auxiliary variable of Legendre change, a cross entropy equation is converted into a coordinate system, coupled parameters are decoupled, and a model is trainedThe energy loss in the process is reduced to the minimum, the model is updated more effectively, and the overall performance of the model is ensured.
For ease of understanding, the present embodiment uses a CIFAR dataset to test a network trained by the existing method and a network trained by the method provided in the embodiments of the present application.
In the present embodiment, CIFAR data sets are divided into CIFAR-10 and CIFAR-100 data sets. The CIFAR-10 dataset consists of 10 classes, each of which has 60000 32x32 color images (of which there are 50,000 training images and 10,000 test images). The CIFAR-100 dataset consists of 100 classes, each containing 600 32x32 images (of which there are 500 training images and 100 test images).
The present embodiment uses the above CIFAR dataset for each of the following networks: ResNet18, ResNet34, Vgg19, PreActResNet18, DenseNet121, GoodLeNet test model training according to a conventional method (taking Adam as an example), and test model training according to the method provided by the embodiment of the application, and Table 1 shows the average accuracy of the test and the epoch which reaches the highest accuracy for the first time:
Figure 486650DEST_PATH_IMAGE091
as can be seen from table 1, the method provided in the embodiment of the present application has higher accuracy on each network, and the training convergence is improved by 9% on average. Therefore, compared with the Adam optimizer algorithm, the method provided by the embodiment of the application can be proved to have the effects of reducing overshoot and accelerating parameter convergence on the premise of ensuring the accuracy.
The method provided by the embodiment of the application is described above. The following describes the apparatus provided in the embodiments of the present application:
referring to fig. 2, fig. 2 is a structural diagram of an apparatus provided in an embodiment of the present application. As shown in fig. 2, the apparatus may include:
an obtaining unit, configured to obtain the target model obtained by training the t-1 generation when the model is trained to the t generation
Figure 87395DEST_PATH_IMAGE001
(ii) a The object model
Figure 816317DEST_PATH_IMAGE002
Is a sub-model generated based on training data in each training batch in the t-1 generation
Figure 347661DEST_PATH_IMAGE003
And a reference model
Figure 111218DEST_PATH_IMAGE004
Determining a fusion model obtained by fusion; when t is greater than 2, the process is repeated,
Figure 882865DEST_PATH_IMAGE005
is a target model obtained by training in the t-2 generation; and, based on the object model
Figure 36766DEST_PATH_IMAGE002
And obtaining the sub-model generated based on the training data in each training batch when t generation is obtained
Figure 184850DEST_PATH_IMAGE007
(ii) a Wherein k represents a training batch;
a fusion unit for determining sub-models
Figure 537334DEST_PATH_IMAGE007
And object model
Figure 683145DEST_PATH_IMAGE002
Distance between them determining submodel
Figure 386658DEST_PATH_IMAGE006
With the object model
Figure 807276DEST_PATH_IMAGE002
Fusing weights of fused target models, and sub-models are fused according to the target model fusion weights
Figure 483108DEST_PATH_IMAGE007
With the object model
Figure 862136DEST_PATH_IMAGE002
Fusing to obtain a fusion model;
a determination unit for combining the fusion model with the target model
Figure 787367DEST_PATH_IMAGE092
Combining to obtain a target model trained in the t generation
Figure 464205DEST_PATH_IMAGE093
To pass the trained target model
Figure 791281DEST_PATH_IMAGE094
And carrying out target identification.
Optionally, the dependency sub-model
Figure 75632DEST_PATH_IMAGE020
And object model
Figure 691421DEST_PATH_IMAGE002
Distance between them determining submodel
Figure 719420DEST_PATH_IMAGE007
With the object model
Figure 166582DEST_PATH_IMAGE002
The fused target model fusion weights include:
according to sub-model
Figure 559517DEST_PATH_IMAGE007
And object model
Figure 459340DEST_PATH_IMAGE010
Generating candidate model fusion weight according to the Manhattan distance; wherein the candidate model fusion weight is given byRepresents:
Figure 291029DEST_PATH_IMAGE095
wherein the content of the first and second substances,
Figure 795960DEST_PATH_IMAGE030
represents
Figure 156534DEST_PATH_IMAGE024
And
Figure 809233DEST_PATH_IMAGE025
the manhattan distance between them,
Figure 631564DEST_PATH_IMAGE026
represents
Figure 787739DEST_PATH_IMAGE024
And
Figure 584793DEST_PATH_IMAGE027
m represents the candidate model fusion weight;
converting each line in the candidate model fusion weight to obtain a target model fusion weight; the transition is used to control the sum of the weight parameters in each row to approach a set value.
Optionally, the converting each row of the candidate model fusion weights comprises:
each row in the candidate model fusion weight is transformed by using a normalized index softmax function, so that the sum of weight parameters in each transformed row is close to 1.
Optionally, the target model fusion weight is represented by:
Figure 396892DEST_PATH_IMAGE096
wherein the content of the first and second substances,
Figure 835963DEST_PATH_IMAGE029
is formed by
Figure 846645DEST_PATH_IMAGE023
The light-emitting diode is obtained by conversion,
Figure 17863DEST_PATH_IMAGE031
is formed by
Figure 379574DEST_PATH_IMAGE026
The light-emitting diode is obtained by conversion,
Figure 91178DEST_PATH_IMAGE097
representing target model fusion weights;
the sub-model is fused according to the target model fusion weight
Figure 159628DEST_PATH_IMAGE020
With the object model
Figure 564065DEST_PATH_IMAGE001
The fusion model obtained by the fusion comprises the following steps:
selecting elements of a last line from the target model fusion weights;
using the selected elements as sub-models
Figure 147493DEST_PATH_IMAGE020
With the object model
Figure 849739DEST_PATH_IMAGE002
Fused weights, sub-models
Figure 100591DEST_PATH_IMAGE007
And object model
Figure 410350DEST_PATH_IMAGE002
And fusing to obtain a fusion model.
Optionally, the fusing model and the target model are combined
Figure 684336DEST_PATH_IMAGE098
Combining to obtain a t-generation trained target model
Figure 737743DEST_PATH_IMAGE099
The method comprises the following steps:
the fusion model and the target model are combined according to the following formula
Figure 843102DEST_PATH_IMAGE002
Carrying out combination:
Figure 527025DEST_PATH_IMAGE100
wherein the content of the first and second substances,
Figure 819466DEST_PATH_IMAGE037
for the defined hyper-parameters for controlling the learning speed,
Figure 942142DEST_PATH_IMAGE034
representing the fusion model.
Optionally, a sub-model generated based on training data in each training batch
Figure 839691DEST_PATH_IMAGE020
Is evaluated by a constructed loss function; the loss function will be described as being based on sub-models
Figure 491253DEST_PATH_IMAGE020
With the object model
Figure 802148DEST_PATH_IMAGE002
The regularization function of the Manhattan distance and the Hamilton energy loss function are two parts; the Hamiltonian energy loss function is used for helping the trained model to approach the theoretical optimal model from the energy perspective.
Optionally, the loss function is represented by the following formula:
Figure 649887DEST_PATH_IMAGE101
wherein, the first and the second end of the pipe are connected with each other,
Figure 464260DEST_PATH_IMAGE046
representing the function of the hamiltonian energy loss,
Figure 552301DEST_PATH_IMAGE047
the label value is used for helping the predicted value to approach the theoretical optimal label value from the energy angle;
Figure 22597DEST_PATH_IMAGE048
to represent
Figure 487076DEST_PATH_IMAGE049
And
Figure 890376DEST_PATH_IMAGE050
the Manhattan distance between the two, a and b are [1, n +1 ]],
Figure 352581DEST_PATH_IMAGE051
=
Figure 106911DEST_PATH_IMAGE052
Optionally, the hamiltonian energy loss function is represented by:
Figure 109502DEST_PATH_IMAGE102
wherein the content of the first and second substances,
Figure 836149DEST_PATH_IMAGE103
representing the hamiltonian energy loss function,
Figure 265993DEST_PATH_IMAGE104
is a variable of the first order and is,
Figure 507619DEST_PATH_IMAGE064
the parameters of the model are represented by a number,
Figure 48322DEST_PATH_IMAGE070
representing output values with respect to a model
Figure 144323DEST_PATH_IMAGE065
And model parameters
Figure 745068DEST_PATH_IMAGE064
A co-optimized function.
Optionally, the
Figure 677252DEST_PATH_IMAGE071
Expressed by a proportional control-integral control-derivative control PID equation, the
Figure 756067DEST_PATH_IMAGE105
Represented by the formula:
Figure 519623DEST_PATH_IMAGE106
wherein, lr represents an initial learning rate,
Figure 228953DEST_PATH_IMAGE073
representing the loss gradient of the model training to the t generation,
Figure 710750DEST_PATH_IMAGE074
a scaling parameter is represented that is representative of,
Figure 593256DEST_PATH_IMAGE075
representing the output value of the model during the training of the model in the ith training batch
Figure 149002DEST_PATH_IMAGE065
With corresponding tag value
Figure 357129DEST_PATH_IMAGE076
Cross entropy of (1) for measuring accuracy of model training;
Wherein, when setting as follows for p:
Figure 247594DEST_PATH_IMAGE107
then, then
Figure 933790DEST_PATH_IMAGE064
Figure 406360DEST_PATH_IMAGE074
The update rules of (a) are respectively:
Figure 457492DEST_PATH_IMAGE108
Figure 648302DEST_PATH_IMAGE109
Figure 75872DEST_PATH_IMAGE083
representation is based on
Figure 668528DEST_PATH_IMAGE084
Determining
Figure 156141DEST_PATH_IMAGE085
A learning rate of time, said
Figure 834247DEST_PATH_IMAGE085
Representing the model parameters when the model was trained to the t +1 generation,
Figure 596667DEST_PATH_IMAGE084
representing model parameters when the model is trained to t generation;
Figure 230779DEST_PATH_IMAGE086
representation is based on
Figure 951610DEST_PATH_IMAGE087
Determining
Figure 851433DEST_PATH_IMAGE088
A learning rate of time, said
Figure 620806DEST_PATH_IMAGE088
Representing the scaling parameters when the model was trained to the t +1 generation,
Figure 922475DEST_PATH_IMAGE087
representing the scaling parameters when the model was trained to the t generation,
Figure 751890DEST_PATH_IMAGE089
representing the hamiltonian energy loss function when the model is trained to the t generations.
Thus, the description of the device structure shown in fig. 2 is completed.
The embodiment of the application also provides a hardware structure of the device shown in fig. 2. Referring to fig. 3, fig. 3 is a structural diagram of an electronic device according to an embodiment of the present disclosure. As shown in fig. 3, the hardware structure may include: a processor and a machine-readable storage medium having stored thereon machine-executable instructions executable by the processor; the processor is configured to execute machine executable instructions to implement the methods disclosed in the above examples of the present application.
Based on the same application concept as the method, embodiments of the present application further provide a machine-readable storage medium, where several computer instructions are stored, and when the computer instructions are executed by a processor, the method disclosed in the above example of the present application can be implemented.
The machine-readable storage medium may be, for example, any electronic, magnetic, optical, or other physical storage device that can contain or store information such as executable instructions, data, and the like. For example, the machine-readable storage medium may be: RAM (random Access Memory), volatile Memory, non-volatile Memory, flash Memory, a storage drive (e.g., a hard drive), a solid state drive, any type of storage disk (e.g., an optical disk, a dvd, etc.), or similar storage medium, or a combination thereof.
The systems, apparatuses, modules or units described in the above embodiments may be specifically implemented by a computer chip or an entity, or implemented by a product with certain functions. A typical implementation device is a computer, which may be in the form of a personal computer, laptop, cellular telephone, camera phone, smart phone, personal digital assistant, media player, navigation device, email messaging device, game console, tablet computer, wearable device, or a combination of any of these devices.
For convenience of description, the above devices are described as being divided into various units by function, respectively. Of course, the functionality of the units may be implemented in one or more software and/or hardware when implementing the present application.
As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, embodiments of the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
Furthermore, these computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
The above description is only an example of the present application and is not intended to limit the present application. Various modifications and changes may occur to those skilled in the art. Any modification, equivalent replacement, improvement or the like made within the spirit and principle of the present application shall be included in the scope of the claims of the present application.

Claims (11)

1. A target identification method for adaptive learning rate collaborative optimization is characterized by comprising the following steps:
under the condition that the model is trained to t generation, obtaining a target model obtained by training in t-1 generation
Figure 614115DEST_PATH_IMAGE001
(ii) a The object model
Figure 367307DEST_PATH_IMAGE002
Is a sub-model generated based on training data in each training batch in the t-1 generation
Figure 983096DEST_PATH_IMAGE003
And a reference model
Figure 214358DEST_PATH_IMAGE004
Determining a fusion model obtained by fusion; when t is greater than 2, the process is repeated,
Figure 599203DEST_PATH_IMAGE005
is a target model obtained by training in the t-2 generation;
based on object model
Figure 523296DEST_PATH_IMAGE006
And obtaining the sub-model generated based on the training data in each training batch when t generation is obtained
Figure 390496DEST_PATH_IMAGE007
(ii) a Wherein k represents a training batch;
according to sub-model
Figure 425448DEST_PATH_IMAGE007
And object model
Figure 930379DEST_PATH_IMAGE006
Distance between them determining submodel
Figure 25374DEST_PATH_IMAGE008
And object model
Figure 615755DEST_PATH_IMAGE006
Fusing weights of fused target models, and sub-models are fused according to the target model fusion weights
Figure 188819DEST_PATH_IMAGE007
With the object model
Figure 548256DEST_PATH_IMAGE006
Fusing to obtain a fusion model;
combining the fusion model with the target model
Figure 50038DEST_PATH_IMAGE006
Combining to obtain a target model trained in the t generation
Figure 862136DEST_PATH_IMAGE009
To pass the trained target model
Figure 504470DEST_PATH_IMAGE010
And carrying out target identification.
2. The method of claim 1, wherein the dependency submodel
Figure 983993DEST_PATH_IMAGE011
With the object model
Figure 420790DEST_PATH_IMAGE006
Distance between them determining submodel
Figure 720185DEST_PATH_IMAGE012
With the object model
Figure 635051DEST_PATH_IMAGE006
The fused target model fusion weights include:
according to sub-model
Figure 733195DEST_PATH_IMAGE013
With the object model
Figure 340894DEST_PATH_IMAGE014
Generating candidate model fusion weight according to the Manhattan distance between the two models; wherein the candidate model fusion weight is represented by:
Figure 393163DEST_PATH_IMAGE015
wherein the content of the first and second substances,
Figure 846141DEST_PATH_IMAGE016
represents
Figure 300256DEST_PATH_IMAGE017
And
Figure 813277DEST_PATH_IMAGE018
the manhattan distance between the two electrodes,
Figure 352843DEST_PATH_IMAGE019
represents
Figure 376556DEST_PATH_IMAGE020
And
Figure 685178DEST_PATH_IMAGE021
the manhattan distance between them, M represents the candidate model fusion weight;
converting each line in the candidate model fusion weight to obtain a target model fusion weight; the transition is used to control the sum of the weight parameters in each row to approach a set value.
3. The method of claim 2, wherein the converting each row of the candidate model fusion weights comprises:
and converting each line in the candidate model fusion weight by using a normalized exponential softmax function so that the sum of weight parameters in each line after conversion is close to 1.
4. The method according to claim 2 or 3, wherein the target model fusion weight is represented by the following formula:
Figure 634679DEST_PATH_IMAGE022
wherein, the first and the second end of the pipe are connected with each other,
Figure 130383DEST_PATH_IMAGE023
is formed by
Figure 456322DEST_PATH_IMAGE024
The light-emitting diode is obtained by conversion,
Figure 619450DEST_PATH_IMAGE025
is through pair
Figure 739853DEST_PATH_IMAGE026
The light-emitting diode is obtained by conversion,
Figure 486966DEST_PATH_IMAGE027
representing target model fusion weights;
the sub-model is fused according to the target model fusion weight
Figure 351017DEST_PATH_IMAGE028
With the object model
Figure 368652DEST_PATH_IMAGE001
The fusion model obtained by the fusion comprises the following steps:
selecting elements of a last row from the target model fusion weights;
using the selected elements as sub-models
Figure 925535DEST_PATH_IMAGE029
With the object model
Figure 395831DEST_PATH_IMAGE006
Fused weights, sub-models
Figure 797993DEST_PATH_IMAGE007
And object model
Figure 670134DEST_PATH_IMAGE006
And fusing to obtain a fusion model.
5. The method of claim 1, wherein the fusing the model with the target model
Figure 887665DEST_PATH_IMAGE030
Combining to obtain a t-generation trained target model
Figure 845257DEST_PATH_IMAGE031
The method comprises the following steps:
the fusion model and the target model are combined according to the following formula
Figure 316689DEST_PATH_IMAGE006
Carrying out combination:
Figure 43337DEST_PATH_IMAGE032
wherein, the first and the second end of the pipe are connected with each other,
Figure 676443DEST_PATH_IMAGE033
for the defined hyper-parameters for controlling the learning speed,
Figure 386910DEST_PATH_IMAGE034
representing the fusion model.
6. The method of claim 1, wherein the sub-models are generated based on training data in each training batch
Figure 396455DEST_PATH_IMAGE013
Is evaluated by a constructed loss function; the loss function will be described as being based on sub-models
Figure 476144DEST_PATH_IMAGE013
And object model
Figure 545731DEST_PATH_IMAGE006
A regularization function of the Manhattan distance and a Hamilton energy loss function; the Hamiltonian energy loss function is used for helping the trained model to approach the theoretical optimal model from the energy perspective.
7. The method of claim 6, wherein the loss function is expressed by the following equation:
Figure 477915DEST_PATH_IMAGE035
wherein, the first and the second end of the pipe are connected with each other,
Figure 291150DEST_PATH_IMAGE036
representing the hamiltonian energy loss function when the model is trained to the t generation,
Figure 992390DEST_PATH_IMAGE037
the label value is used for helping the predicted value to approach the theoretical optimal label value from the energy angle;
Figure 967299DEST_PATH_IMAGE038
to represent
Figure 652359DEST_PATH_IMAGE039
And
Figure 239591DEST_PATH_IMAGE040
the Manhattan distance between the two, a and b are [1, n +1 ]],
Figure 326496DEST_PATH_IMAGE041
=
Figure 206727DEST_PATH_IMAGE042
8. The method of claim 6 or 7, wherein the hamiltonian energy loss function is represented by the following equation:
Figure 113503DEST_PATH_IMAGE043
wherein, the first and the second end of the pipe are connected with each other,
Figure 268541DEST_PATH_IMAGE044
representing the hamiltonian energy loss function,
Figure 678794DEST_PATH_IMAGE045
is a variable of the first order and is,
Figure 261085DEST_PATH_IMAGE046
the parameters of the model are represented by,
Figure 153692DEST_PATH_IMAGE047
representing output values with respect to a model
Figure 846842DEST_PATH_IMAGE048
And model parameters
Figure 642760DEST_PATH_IMAGE046
Co-optimized function, model output value
Figure 130373DEST_PATH_IMAGE048
Is the output value of the model when the model is trained in the kth training batch in the t generation.
9. The method of claim 8, wherein the step of measuring is performed by a computer
Figure 746162DEST_PATH_IMAGE049
Expressed by a proportional control-integral control-derivative control PID equation, the
Figure 243002DEST_PATH_IMAGE049
Represented by the formula:
Figure 627847DEST_PATH_IMAGE050
where lr represents an initial learning rate,
Figure DEST_PATH_IMAGE051
representing the loss gradient of the model training to the t generation,
Figure 522247DEST_PATH_IMAGE052
the scaling parameters are represented by a scale parameter,
Figure 890912DEST_PATH_IMAGE053
representing the output value of the model when the model is trained to the kth training batch when the model is trained to the t generation
Figure 660285DEST_PATH_IMAGE054
With corresponding tag value
Figure 165215DEST_PATH_IMAGE055
The cross entropy of (2) is used for measuring the accuracy of model training;
wherein, when setting as follows for p:
Figure 260210DEST_PATH_IMAGE056
then, then
Figure 850591DEST_PATH_IMAGE057
Figure 689234DEST_PATH_IMAGE058
The update rules of (1) are respectively:
Figure 812786DEST_PATH_IMAGE059
Figure 813103DEST_PATH_IMAGE060
Figure 890781DEST_PATH_IMAGE061
the representation is based on
Figure 1956DEST_PATH_IMAGE062
Determining
Figure 215900DEST_PATH_IMAGE063
A learning rate of time, said
Figure 652697DEST_PATH_IMAGE063
Representing the model parameters when the model was trained to the t +1 generation,
Figure 719136DEST_PATH_IMAGE062
representing model parameters when the model is trained to t generation;
Figure 899581DEST_PATH_IMAGE064
the representation is based on
Figure 499190DEST_PATH_IMAGE065
Determining
Figure 841309DEST_PATH_IMAGE066
A learning rate of time, said
Figure 628000DEST_PATH_IMAGE066
Representing the scaling parameters when the model was trained to the t +1 generation,
Figure 612136DEST_PATH_IMAGE065
representing the scaling parameters when the model was trained to the t generation,
Figure 66251DEST_PATH_IMAGE037
representing the hamiltonian energy loss function when the model is trained to t generations.
10. An adaptive learning rate collaborative optimization target identification device, the device comprising:
an obtaining unit, configured to obtain the target model obtained by training the t-1 generation when the model is trained to the t generation
Figure 77808DEST_PATH_IMAGE006
(ii) a The object model
Figure 351794DEST_PATH_IMAGE006
Is a sub-model generated based on training data in each training batch in t-1 generation
Figure 874042DEST_PATH_IMAGE067
And a reference model
Figure 182664DEST_PATH_IMAGE068
Determining a fusion model obtained by fusion; when the value t is greater than 2, the value,
Figure 866586DEST_PATH_IMAGE068
a target model obtained by training in the t-2 generation; and, based on the object model
Figure 627869DEST_PATH_IMAGE014
And obtaining the sub-model generated based on the training data in each training batch when t generation is obtained
Figure 688229DEST_PATH_IMAGE069
(ii) a Wherein k represents a training batch;
a fusion unit for determining sub-models
Figure 630120DEST_PATH_IMAGE070
With the object model
Figure 750522DEST_PATH_IMAGE014
Distance between them determining submodel
Figure 733522DEST_PATH_IMAGE013
And object model
Figure 597573DEST_PATH_IMAGE014
Fusing weights of fused target models, and sub-models are fused according to the target model fusion weights
Figure 880786DEST_PATH_IMAGE070
With the object model
Figure 172090DEST_PATH_IMAGE014
Fusing to obtain a fusion model;
a determination unit for combining the fusion model with the target model
Figure 642386DEST_PATH_IMAGE014
Combining to obtain a target model trained in the t generation
Figure 808663DEST_PATH_IMAGE071
To pass the trained target model
Figure 946383DEST_PATH_IMAGE010
And carrying out target identification.
11. An electronic device, comprising: a processor and a machine-readable storage medium;
the machine-readable storage medium stores machine-executable instructions executable by the processor;
the processor is operable to execute machine executable instructions to perform the method steps of any of claims 1-9.
CN202210499300.3A 2022-05-09 2022-05-09 Target identification method and device for adaptive learning rate collaborative optimization and electronic equipment Active CN114612750B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210499300.3A CN114612750B (en) 2022-05-09 2022-05-09 Target identification method and device for adaptive learning rate collaborative optimization and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210499300.3A CN114612750B (en) 2022-05-09 2022-05-09 Target identification method and device for adaptive learning rate collaborative optimization and electronic equipment

Publications (2)

Publication Number Publication Date
CN114612750A CN114612750A (en) 2022-06-10
CN114612750B true CN114612750B (en) 2022-08-19

Family

ID=81869151

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210499300.3A Active CN114612750B (en) 2022-05-09 2022-05-09 Target identification method and device for adaptive learning rate collaborative optimization and electronic equipment

Country Status (1)

Country Link
CN (1) CN114612750B (en)

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114049539A (en) * 2022-01-10 2022-02-15 杭州海康威视数字技术股份有限公司 Collaborative target identification method, system and device based on decorrelation binary network

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109841220B (en) * 2017-11-24 2022-09-13 深圳市腾讯计算机系统有限公司 Speech signal processing model training method and device, electronic equipment and storage medium
CN110557829B (en) * 2019-09-17 2020-12-11 北京东方国信科技股份有限公司 Positioning method and positioning device for fusing fingerprint database
CN112861984B (en) * 2021-02-25 2022-07-01 西华大学 Speech emotion classification method based on feature fusion and ensemble learning
CN113222014A (en) * 2021-05-12 2021-08-06 深圳思谋信息科技有限公司 Image classification model training method and device, computer equipment and storage medium
CN113312445B (en) * 2021-07-29 2022-02-11 阿里云计算有限公司 Data processing method, model construction method, classification method and computing equipment
CN114332550A (en) * 2021-08-19 2022-04-12 腾讯科技(深圳)有限公司 Model training method, system, storage medium and terminal equipment
CN114047691A (en) * 2022-01-10 2022-02-15 杭州海康威视数字技术股份有限公司 Target identification method and device for adaptive learning rate collaborative optimization and electronic equipment

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114049539A (en) * 2022-01-10 2022-02-15 杭州海康威视数字技术股份有限公司 Collaborative target identification method, system and device based on decorrelation binary network

Also Published As

Publication number Publication date
CN114612750A (en) 2022-06-10

Similar Documents

Publication Publication Date Title
US20210256403A1 (en) Recommendation method and apparatus
EP3711000B1 (en) Regularized neural network architecture search
Gemici et al. Generative temporal models with memory
WO2019165462A1 (en) Unsupervised neural network training using learned optimizers
KR102203252B1 (en) Method and system for collaborative filtering based on generative adversarial networks
CN110770760A (en) Object-level prediction of future states of a physical system
CN114021524B (en) Emotion recognition method, device, equipment and readable storage medium
CN112906888B (en) Task execution method and device, electronic equipment and storage medium
Li et al. Energy-based models for continual learning
US20220366257A1 (en) Small and Fast Video Processing Networks via Neural Architecture Search
CN113128432B (en) Machine vision multitask neural network architecture searching method based on evolution calculation
CN114091597A (en) Countermeasure training method, device and equipment based on adaptive group sample disturbance constraint
CN114612750B (en) Target identification method and device for adaptive learning rate collaborative optimization and electronic equipment
CN114137967B (en) Driving behavior decision method based on multi-network joint learning
JP7438544B2 (en) Neural network processing device, computer program, neural network manufacturing method, neural network data manufacturing method, neural network utilization device, and neural network downsizing method
CN110659962A (en) Commodity information output method and related device
Sharma et al. A generalized novel image forgery detection method using generative adversarial network
Nylend Data efficient deep reinforcement learning through model-based intrinsic motivation
CN116401377B (en) Time sequence knowledge graph reasoning method based on diffusion probability distribution
Annabestani et al. A new soft computing method for integration of expert's knowledge in reinforcement learn-ing problems
US20230029590A1 (en) Evaluating output sequences using an auto-regressive language model neural network
US20230336823A1 (en) Real-Time Adaptive Content Generation with Dynamic Sentiment Prediction
CN115187845A (en) Image processing method, device and equipment
CN116911373A (en) Training method and device for reinforcement learning model
CN114219099A (en) Symbolic option and action model self-learning-based task planning method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant