CN114612750B - Target identification method and device for adaptive learning rate collaborative optimization and electronic equipment - Google Patents
Target identification method and device for adaptive learning rate collaborative optimization and electronic equipment Download PDFInfo
- Publication number
- CN114612750B CN114612750B CN202210499300.3A CN202210499300A CN114612750B CN 114612750 B CN114612750 B CN 114612750B CN 202210499300 A CN202210499300 A CN 202210499300A CN 114612750 B CN114612750 B CN 114612750B
- Authority
- CN
- China
- Prior art keywords
- model
- fusion
- target
- generation
- training
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/044—Recurrent networks, e.g. Hopfield networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/047—Probabilistic or stochastic networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02T—CLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
- Y02T10/00—Road transport of goods or passengers
- Y02T10/10—Internal combustion engine [ICE] based vehicles
- Y02T10/40—Engine management systems
Abstract
The embodiment of the application provides a target identification method and device for adaptive learning rate collaborative optimization and electronic equipment. In this embodiment, the model trained in each training batch is adaptively fused with the previously trained model according to the difference between the model trained in each training batch and the previously trained model, so that forgetting of the model on previous knowledge can be reduced, loss oscillation in the neural network training process is avoided, and smooth convergence of the model is accelerated.
Description
Technical Field
The present application relates to the field of artificial intelligence, and in particular, to a target identification method and apparatus for adaptive learning rate collaborative optimization, and an electronic device.
Background
Deep learning has demonstrated excellent performance in various different areas of image, speech, natural language processing, and the like. Taking face recognition as an example, the recognition rate and recognition speed of deep learning already exceed the speed of manual recognition. In addition, word recognition, text translation, and automated driving, which is now heavily studied, all benefit from the development of deep learning.
At present, although a reverse feedback (BP) algorithm with a good effect is adopted to calculate a gradient in a neural network model training process based on deep learning, oscillation loss still exists, such as slow training speed and incapability of converging the model. The loss shock can affect the training of the neural network model, and further affect the application of the neural network model, such as inaccuracy of recognized target objects (such as faces) and/or abnormal action behaviors (such as fire, robbery, fighting and stealing).
Disclosure of Invention
The embodiment of the application provides a target identification method and device for adaptive collaborative optimization and electronic equipment, so that loss oscillation in the training process of a neural network model is avoided by adaptively fusing a trained historical model, and the accuracy of target identification is improved.
The embodiment of the application provides a target identification method for adaptive collaborative optimization, which comprises the following steps:
under the condition that the model is trained to t generation, obtaining a target model obtained by training in t-1 generation(ii) a The object modelIs a sub-model generated based on training data in each training batch in the t-1 generationAnd a reference modelDetermining a fusion model obtained by fusion; when the value t is greater than 2, the value,is a target model obtained by training in the t-2 generation;
based on a target modelAnd obtaining the sub-model generated based on the training data in each training batch when t generation is obtained(ii) a Wherein k represents a training batch;
according to sub-modelWith the object modelDistance between them determining submodelWith the object modelFusing weights of the fused target models, and combining the sub-models according to the target model fusion weightsWith the object modelFusing to obtain a fusion model;
combining the fusion model with the target modelCombining to obtain a target model trained in the t generationTo pass the trained target modelAnd carrying out target identification.
The embodiment of the application provides a target identification device for adaptive collaborative optimization, which comprises:
an obtaining unit, configured to obtain the target model obtained through training in the t-1 generation when the model is trained to the t generation(ii) a The object modelIs a sub-model generated based on training data in each training batch in the t-1 generationAnd a reference modelDetermining a fusion model obtained by fusion; when t is greater than 2, the process is repeated,is a target model obtained by training in the t-2 generation; and, based on the object modelAnd obtaining the sub-model generated based on the training data in each training batch when t generation is obtained(ii) a Wherein k represents a training batch;
a fusion unit for being dependent on the sub-modelWith the object modelDistance between them determining submodelWith the object modelFusing weights of fused target models, and sub-models are fused according to the target model fusion weightsWith the object modelFusing to obtain a fusion model;
a determination unit for combining the fusion model with the target modelCombining to obtain a target model trained in the t generationTo pass the trained target modelAnd carrying out target identification.
The embodiment of the application also provides the electronic equipment. The electronic device includes: a processor and a machine-readable storage medium;
the machine-readable storage medium stores machine-executable instructions executable by the processor;
the processor is configured to execute machine-executable instructions to implement the steps of the above-disclosed method.
According to the technical scheme, in the embodiment of the application, in the model training process, the sub-model trained in each training batch is adaptively fused with the target model trained in the previous generation according to the difference between the sub-model trained in each training batch and the target model trained in the previous generation, and the target model trained to the current generation is determined according to the finally fused model, so that the target model trained in any generation can inherit the information of the target model trained in the previous generation and the sub-model trained in each training batch, the forgetting of the previous knowledge by the target model trained at present is reduced, the loss oscillation in the neural network model training process is avoided, and the accuracy of target identification is improved;
further, in the embodiment, by adaptively fusing the model trained in each training batch with the target model trained in the previous generation, after iteration is performed for a sufficient number of times, it is finally found that the difference between the sub-model trained in the training batch and the target model trained in the previous generation is very small and basically tends to be consistent, so that the problem of extreme learning rate is avoided, and the generalization performance of the method provided by the embodiment of the present application is ensured.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present disclosure and together with the description, serve to explain the principles of the disclosure.
FIG. 1 is a flow chart of a method provided by an embodiment of the present application;
FIG. 2 is a block diagram of an apparatus according to an embodiment of the present disclosure;
fig. 3 is a structural diagram of an electronic device according to an embodiment of the present application.
Detailed Description
Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The embodiments described in the following exemplary embodiments do not represent all embodiments consistent with the present application. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the present application, as detailed in the appended claims.
The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the application. As used in this application and the appended claims, the singular forms "a", "an", and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise.
In order to make the technical solutions provided in the embodiments of the present application better understood and make the above objects, features and advantages of the embodiments of the present application more comprehensible, the technical solutions in the embodiments of the present application are described in further detail below with reference to the accompanying drawings.
In the conventional model training at present, no matter which generation the model is trained to, the model is directly updated on the newly trained model at present. For example, when the model is trained to t generation, the model trained in t-1 generation (i.e. the target model trained in t-1 generation) is directly updated, and when the model is trained to t +1 generation, the model trained in t generation (i.e. the target model trained in t generation) is directly updated, and so on.
However, it has been found that such direct updating of the currently newly trained model during the model training iteration is ideal. In the training process of the current model, due to the inverted shooting inhibition, catastrophic forgetfulness to a previously trained model, such as a model characteristic parameter, often occurs, so that a later trained model and a previously trained model compete with each other, and the previous learning is interfered.
In order to solve the above technical problems, this embodiment breaks through the conventional model training manner, and adaptively fuses the model trained in each training batch with the previously trained model according to the difference between the model trained in each training batch and the previously trained model during model training, so as to avoid the above feature parameter forgetting problem, accelerate model convergence, avoid loss oscillation during neural network model training, and further improve the accuracy of target identification.
The method provided by the embodiments of the present application is described below:
referring to fig. 1, fig. 1 is a flowchart of a method provided in an embodiment of the present application. The process is applied to the electronic equipment. Optionally, as an embodiment, the electronic device may be an internet of things terminal device such as a video terminal and a door access device, which is applied to the neural network model. As another embodiment, the electronic device may also be a backend device such as a server, and the embodiment is not limited in particular.
As shown in fig. 1, the process may include the following steps:
and 101, under the condition that the model is trained to t generation, obtaining a target model obtained by training in t-1 generation.
For convenience of description, in this embodiment, the target model obtained by training in the t-1 generation is denoted as。
In the present embodiment, the object modelIs a sub-model generated based on training data in each training batch in the t-1 generationAnd a reference modelAnd determining a fusion model obtained by fusion. When the value t is greater than 2, the value,the target model obtained by training in the t-2 generation. Of course, when t =2,is the initial model at the beginning of the entire model training. In this embodiment, the initial model is not particularly limited, and may be, for example, a fully-connected network, a convolutional neural network, a cyclic neural network, or the like. As for the network structure, the embodiment is not limited, for example, taking a fully connected network as an example, the model may include: the device comprises an input layer, a first hidden layer, a second hidden layer, a third hidden layer, a fourth hidden layer, a fifth hidden layer and an output layer.
As for the object modelThe training mode is similar to how the target model of t generation is trained (noted as). For the moment, this description will not be repeated.
And 102, acquiring a sub-model generated based on training data in each training batch when the generation t is based on the target model obtained by training the generation t-1.
In application, training data used for model training is relatively large, and a whole set of training data is often divided into a plurality of batches for training. For example, there are 50000 training data, and 50000 training data can be divided into 10 training batches, each training batch containing 5000 training data.
It should be noted that, this embodiment does not specifically limit the training data in each training batch, and may determine the training data according to a target to be recognized, for example, the recognition target is a human face, a verification code reading, and the like, and may use feature vector data obtained by processing corresponding picture data as the training data, and the like, which is not specifically limited in this embodiment.
In this embodiment, for the first training batch, the training data of the first training batch is used to directly train the target model obtained in the t-1 generationPerforming model updating to obtainFor the second training batch, directly pairing the training data in the second training batchUpdating the model to obtainRepeating the steps until the last training batch n is obtained, and obtaining n-1 training batches by using the training data in the nth training batchUpdating the model to obtain. Finally, a sub-model generated based on training data in each training batch when t generation is obtainedModel (III)And k represents a training batch.Represented by the formula:}。
and 103, determining a target model fusion weight for fusion of the sub-model and the target model obtained by training in the t-1 generation according to the distance between the sub-model generated based on the training data in each training batch and the target model obtained by training in the t-1 generation during the t generation, and fusing the sub-model and the target model obtained by training in the t-1 generation according to the target model fusion weight to obtain a fusion model.
Optionally, in this embodiment, the sub-model generated based on the training data in each training batch at the time of generation t can be usedAndthe Manhattan distance between the two models is used as an adaptive weight parameter for weighted fusion of the models so as to realize that the finally obtained fusion model completes the deduction from an individual to the whole.
Based on the above description, follow the sub-modelWith the object modelThe manhattan distance between them, a candidate model fusion weight is generated as represented by:
wherein the content of the first and second substances,representAndthe manhattan distance between the two electrodes,representsAndmanhattan distance between them, M represents the candidate model fusion weight.
After the candidate model fusion weight is obtained, as one embodiment, the candidate model fusion weight may be taken as the target model fusion weight. As another embodiment, in order to ensure that the sum of the weight parameters in each row is close to a set value, such as 1, each row in the candidate model fusion weights may be transformed to obtain the target model fusion weight. Wherein the transition is used to control the sum of the weight parameters in each row to approach a set value. In a specific implementation, for example, the conversion may be implemented by a normalized exponential function (softmax), and the like, and the embodiment is not particularly limited.
Corresponding to the candidate model fusion weight, the final target model fusion weight in this embodiment can be represented by the following formula:
wherein the content of the first and second substances,is through pairThe light-emitting diode is obtained by conversion,is through pairThe light-emitting diode is obtained by conversion,representing the target model fusion weights.
After the target model fusion weight is obtained, as described in step 103, the sub-models can be combined according to the target model fusion weightWith the object modelAnd fusing to obtain a fusion model.
As one embodiment, sub-models are fused with weights according to the target modelAnd object modelWhen merging, one row of elements may be selected from the target model fusion weights first, and based on the target model fusion weights described above, the last row of elements in the target model fusion weights may be selected (because the elements embody the target model trained with t-1 generation)Manhattan distance between). Thereafter, the selected elements are used as sub-modelsWith the object modelThe fused weight, based on the object model fusion weight described above, may be, for example, the first element of the last line of the selection as a sub-modelAnd object modelThe fused weight can use the second element of the last line of the selection as a sub-modelAnd object modelThe fused weight is analogized in turn. Then, based on the sub-modelWith the object modelFused weights, sub-modelsWith the object modelAnd fusing to obtain a fused model.
In this embodiment, if the last row of elements in the fusion weight of the target model is selected, the fusion model can be represented by the following formula:
wherein the content of the first and second substances,representing the fusion model, k represents the training batch, when k = n +1,=。
it should be noted that, in this embodiment, the last row of elements in the target model fusion weight is selected because all the last row of elements exhibit the target model trained in the t-1 generationThe Manhattan distance between the two sub-models ensures the sub-model trained currently(i.e., the task-specific model) can absorb as much of the base model as possibleKnowledge of (a). In the present embodiment of the present invention,the method can be used as an adaptive parameter in a model learning process, and helps the model to complete optimization more robustly.
And 104, combining the fusion model with the target model obtained by training in the t-1 generation to obtain a target model trained in the t generation, and performing target identification through the target model trained in the t generation.
In the embodiment, the use of the fusion model is introduced when the fusion model is combined with the target model obtained by training in the t-1 generationOver-parameters in controlling learning speed. For example, the fusion model is combined with the target model obtained by training in the t-1 generation according to the following formula:
here, the number of the first and second electrodes,it may be preset, such as to 0.1. The present embodiment is not particularly limited.
Due to the above candidate fusion weightOr the object model fusion weights regularization, such that after a sufficient number of iterations, the task-specific model typeAnd a basic modelThe difference between them is gradually reduced, and the task is specific to the model typeAnd basic modelThe weights of (a) and (b) tend to be consistent. When all models are task-specific modelsAnd a basic modelIdeally optimized to the same model, which share the same knowledge, thereby eliminating task specific modelsInformation is lost in type fusion. By the model fusion method, the forgetting of the model to the previous knowledge can be reduced, so that loss oscillation in the neural network training process is avoided, and the smooth convergence of the model is accelerated.
In addition, the target recognition mentioned in step 104 may be used for target recognition of pictures or videos. For example, for face recognition, step 104 may be implemented by training the target modelCarrying out face recognition on the face picture; as another example, when applied to the recognition of objects, such as vehicles, the step 104 may be implemented by training the object modelPerforming target correspondence such as vehicle recognition on a target object picture such as a vehicle picture; for example, the method can be applied to abnormal behavior recognition (such as robbery, fire, fighting, etc.), and the step 104 can be implemented by trained target modelsThe video image is subjected to abnormal behavior identification, and this embodiment is not particularly limited.
In this embodiment, the trained object model is used as the target modelAlso updated on demand, e.g. in the target modelWhen the number of the false alarms reaches a preset threshold value, automatically updating; as another example, manual updates and the like are performed based on external triggers such as maintenance personnel and the like. The present embodiment is not particularly limited. As for the updating manner, the flow shown in fig. 1 can be returned.
The flow shown in fig. 1 is thus completed.
As can be seen from the flow shown in fig. 1, in the present embodiment, in the model training process, the model trained in each training batch is adaptively fused with the previously trained model according to the difference between the model trained in each training batch and the previously trained model, so that the forgetting of the model to the previous knowledge can be reduced, thereby avoiding the loss oscillation in the neural network training process and accelerating the smooth convergence of the model.
The sub-model generated for the training data in each training batch is as followsThe description is that:
in the present embodiment, the sub-models are generated based on the training data in each training batchIs evaluated by the constructed loss function. Here, the loss function will be described as being based on a sub-modelWith the object modelThe regularization function of the Manhattan distance and the Hamilton energy loss function are two parts; the Hamiltonian energy loss function is used to help the trained model approach the theoretically optimal model from an energy perspective.
Alternatively, in this embodiment, the loss function can be represented by the following formula:
wherein, the first and the second end of the pipe are connected with each other,representing the hamiltonian energy loss function,for assisting in predicting values fromThe energy angle approaches the theoretical optimal label value;to representAndthe Manhattan distance between the two, a and b are [1, n +1 ]],=。
The following describes a hamiltonian energy loss function related to an embodiment of the present application:
in model training, the following problems often occur: on one hand, training data in different training batches have different distributions, and the different distributions bring difficulty to the training of the whole network; on the other hand, after the training data in the training batch is calculated by a layer-by-layer network, the data distribution of the training data also changes (this phenomenon is called data distribution drift). In order to solve the above problem, the model trained in this embodiment at least includes a batch normalization (batch norm) layer.
The batch normalization layer performs corresponding operations through the following formula:
wherein the content of the first and second substances,indicates the batch size (batch size),in order to be an input, the user can select,is the average value of the batches,is the variance of the batch,is a minimum value to prevent points from being scoredIt is not meaningful to have the information that,andas a trainable parameter of the batch specification layer,is the output.
In the model, operations also need to be performed on the output of the batch specification layer, model parameters, and the like. Taking the convolution operation performed on the output of the batch specification layer and the model parameter as an example, the convolution operation performed on the output of the batch specification layer and the model parameter is expressed as follows:. Wherein the content of the first and second substances,the parameters of the model are represented by,representing the output of the batch specification layer. In application, operations are performed on the output of the batch specification layer and model parameters, etc., which constitute parameters in the physical structureThe parameter coupling problem exists in the model training process.
For the parameter coupling problem existing in the model training process, if the same gradient is reduced on the parameter, potential energy loss is caused. In order to ensure the minimum energy loss in the model training process and maintain the overall performance of the model, a hamilton energy loss function is introduced in the embodiment. Coupling relations or cooperation of different parameters are considered through a Hamilton energy loss function, the gradient is ensured to descend along the terrain constrained by the different parameters on the premise of following the minimum energy loss principle, and the coupling relations among the different parameters are fully considered.
Alternatively, in the present embodiment, the hamiltonian energy loss function can be represented by the following formula:
wherein the content of the first and second substances,representing the hamiltonian energy loss function,is a variable of the first order and is,the parameters of the model are represented by,representing output values with respect to a modelAnd model parametersA co-optimized function.
As an example, in particular implementations,can be proportional controlSystem-integral-control-differential-control (PID) equation representation, e.g.Comprises the following steps:
where lr represents an initial learning rate,representing the loss gradient of the model training to the t generation,representing a scaling parameter.Representing the output value of the model during the training of the model in the ith training batchWith corresponding tag valueThe cross entropy of (2) is used for measuring the accuracy of model training.
Optionally, in the present embodiment, can be directed toThe following settings were made:to makeAnd as an auxiliary variable of Legendre change, converting a cross entropy equation into a coordinate system, decoupling coupled parameters, and reducing the energy loss in the model training process to the minimum to ensure the overall performance of the model.
As an example, when addressingWhen the above-described setting is performed, correspondingly, in the present embodiment,、the update rules of (1) are respectively:
wherein the content of the first and second substances,the representation is based onDeterminingA learning rate of time, saidRepresenting the model parameters when the model was trained to the t +1 generation,representing model parameters when the model is trained to t generation;
the representation is based onDeterminingA learning rate of time, saidRepresenting the scaling parameters when the model was trained to the t +1 generation,representing the scaling parameters when the model was trained to the t generation,representing the hamiltonian energy loss function when the model is trained to the t generations.
Finally, with the loss function, when model training is performed based on the training data in each training batch, if the loss function calculated based on the output value (also referred to as predicted value) and the label value (also referred to as true value) of the current model is found to be the minimum, the model training of the training batch is ended, and the model at that time is marked as the sub-model generated by the training batch。
In the early stage of training, the output value of the loss function is changed greatly, namely the loss gradient is large, and correspondingly the learning step length is large, so that the model can be allowed to change in a large range, and the optimization of the weight parameters is accelerated. When training tends to be stable, the output value change of the loss function becomes small, namely the loss gradient is small, the learning step length tends to be fixed, the problem of extreme learning rate is avoided, the generalization performance of the optimizer algorithm is ensured, and the advantages of the optimizer algorithm are exerted.
Further, in the present embodiment, byAs an auxiliary variable of Legendre change, a cross entropy equation is converted into a coordinate system, coupled parameters are decoupled, and a model is trainedThe energy loss in the process is reduced to the minimum, the model is updated more effectively, and the overall performance of the model is ensured.
For ease of understanding, the present embodiment uses a CIFAR dataset to test a network trained by the existing method and a network trained by the method provided in the embodiments of the present application.
In the present embodiment, CIFAR data sets are divided into CIFAR-10 and CIFAR-100 data sets. The CIFAR-10 dataset consists of 10 classes, each of which has 60000 32x32 color images (of which there are 50,000 training images and 10,000 test images). The CIFAR-100 dataset consists of 100 classes, each containing 600 32x32 images (of which there are 500 training images and 100 test images).
The present embodiment uses the above CIFAR dataset for each of the following networks: ResNet18, ResNet34, Vgg19, PreActResNet18, DenseNet121, GoodLeNet test model training according to a conventional method (taking Adam as an example), and test model training according to the method provided by the embodiment of the application, and Table 1 shows the average accuracy of the test and the epoch which reaches the highest accuracy for the first time:
as can be seen from table 1, the method provided in the embodiment of the present application has higher accuracy on each network, and the training convergence is improved by 9% on average. Therefore, compared with the Adam optimizer algorithm, the method provided by the embodiment of the application can be proved to have the effects of reducing overshoot and accelerating parameter convergence on the premise of ensuring the accuracy.
The method provided by the embodiment of the application is described above. The following describes the apparatus provided in the embodiments of the present application:
referring to fig. 2, fig. 2 is a structural diagram of an apparatus provided in an embodiment of the present application. As shown in fig. 2, the apparatus may include:
an obtaining unit, configured to obtain the target model obtained by training the t-1 generation when the model is trained to the t generation(ii) a The object modelIs a sub-model generated based on training data in each training batch in the t-1 generationAnd a reference modelDetermining a fusion model obtained by fusion; when t is greater than 2, the process is repeated,is a target model obtained by training in the t-2 generation; and, based on the object modelAnd obtaining the sub-model generated based on the training data in each training batch when t generation is obtained(ii) a Wherein k represents a training batch;
a fusion unit for determining sub-modelsAnd object modelDistance between them determining submodelWith the object modelFusing weights of fused target models, and sub-models are fused according to the target model fusion weightsWith the object modelFusing to obtain a fusion model;
a determination unit for combining the fusion model with the target modelCombining to obtain a target model trained in the t generationTo pass the trained target modelAnd carrying out target identification.
Optionally, the dependency sub-modelAnd object modelDistance between them determining submodelWith the object modelThe fused target model fusion weights include:
according to sub-modelAnd object modelGenerating candidate model fusion weight according to the Manhattan distance; wherein the candidate model fusion weight is given byRepresents:
wherein the content of the first and second substances,representsAndthe manhattan distance between them,representsAndm represents the candidate model fusion weight;
converting each line in the candidate model fusion weight to obtain a target model fusion weight; the transition is used to control the sum of the weight parameters in each row to approach a set value.
Optionally, the converting each row of the candidate model fusion weights comprises:
each row in the candidate model fusion weight is transformed by using a normalized index softmax function, so that the sum of weight parameters in each transformed row is close to 1.
Optionally, the target model fusion weight is represented by:
wherein the content of the first and second substances,is formed byThe light-emitting diode is obtained by conversion,is formed byThe light-emitting diode is obtained by conversion,representing target model fusion weights;
the sub-model is fused according to the target model fusion weightWith the object modelThe fusion model obtained by the fusion comprises the following steps:
selecting elements of a last line from the target model fusion weights;
using the selected elements as sub-modelsWith the object modelFused weights, sub-modelsAnd object modelAnd fusing to obtain a fusion model.
Optionally, the fusing model and the target model are combinedCombining to obtain a t-generation trained target modelThe method comprises the following steps:
the fusion model and the target model are combined according to the following formulaCarrying out combination:
wherein the content of the first and second substances,for the defined hyper-parameters for controlling the learning speed,representing the fusion model.
Optionally, a sub-model generated based on training data in each training batchIs evaluated by a constructed loss function; the loss function will be described as being based on sub-modelsWith the object modelThe regularization function of the Manhattan distance and the Hamilton energy loss function are two parts; the Hamiltonian energy loss function is used for helping the trained model to approach the theoretical optimal model from the energy perspective.
Optionally, the loss function is represented by the following formula:
wherein, the first and the second end of the pipe are connected with each other,representing the function of the hamiltonian energy loss,the label value is used for helping the predicted value to approach the theoretical optimal label value from the energy angle;to representAndthe Manhattan distance between the two, a and b are [1, n +1 ]],=。
Optionally, the hamiltonian energy loss function is represented by:
wherein the content of the first and second substances,representing the hamiltonian energy loss function,is a variable of the first order and is,the parameters of the model are represented by a number,representing output values with respect to a modelAnd model parametersA co-optimized function.
Optionally, theExpressed by a proportional control-integral control-derivative control PID equation, theRepresented by the formula:
wherein, lr represents an initial learning rate,representing the loss gradient of the model training to the t generation,a scaling parameter is represented that is representative of,representing the output value of the model during the training of the model in the ith training batchWith corresponding tag valueCross entropy of (1) for measuring accuracy of model training;
representation is based onDeterminingA learning rate of time, saidRepresenting the model parameters when the model was trained to the t +1 generation,representing model parameters when the model is trained to t generation;
representation is based onDeterminingA learning rate of time, saidRepresenting the scaling parameters when the model was trained to the t +1 generation,representing the scaling parameters when the model was trained to the t generation,representing the hamiltonian energy loss function when the model is trained to the t generations.
Thus, the description of the device structure shown in fig. 2 is completed.
The embodiment of the application also provides a hardware structure of the device shown in fig. 2. Referring to fig. 3, fig. 3 is a structural diagram of an electronic device according to an embodiment of the present disclosure. As shown in fig. 3, the hardware structure may include: a processor and a machine-readable storage medium having stored thereon machine-executable instructions executable by the processor; the processor is configured to execute machine executable instructions to implement the methods disclosed in the above examples of the present application.
Based on the same application concept as the method, embodiments of the present application further provide a machine-readable storage medium, where several computer instructions are stored, and when the computer instructions are executed by a processor, the method disclosed in the above example of the present application can be implemented.
The machine-readable storage medium may be, for example, any electronic, magnetic, optical, or other physical storage device that can contain or store information such as executable instructions, data, and the like. For example, the machine-readable storage medium may be: RAM (random Access Memory), volatile Memory, non-volatile Memory, flash Memory, a storage drive (e.g., a hard drive), a solid state drive, any type of storage disk (e.g., an optical disk, a dvd, etc.), or similar storage medium, or a combination thereof.
The systems, apparatuses, modules or units described in the above embodiments may be specifically implemented by a computer chip or an entity, or implemented by a product with certain functions. A typical implementation device is a computer, which may be in the form of a personal computer, laptop, cellular telephone, camera phone, smart phone, personal digital assistant, media player, navigation device, email messaging device, game console, tablet computer, wearable device, or a combination of any of these devices.
For convenience of description, the above devices are described as being divided into various units by function, respectively. Of course, the functionality of the units may be implemented in one or more software and/or hardware when implementing the present application.
As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, embodiments of the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
Furthermore, these computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
The above description is only an example of the present application and is not intended to limit the present application. Various modifications and changes may occur to those skilled in the art. Any modification, equivalent replacement, improvement or the like made within the spirit and principle of the present application shall be included in the scope of the claims of the present application.
Claims (11)
1. A target identification method for adaptive learning rate collaborative optimization is characterized by comprising the following steps:
under the condition that the model is trained to t generation, obtaining a target model obtained by training in t-1 generation(ii) a The object modelIs a sub-model generated based on training data in each training batch in the t-1 generationAnd a reference modelDetermining a fusion model obtained by fusion; when t is greater than 2, the process is repeated,is a target model obtained by training in the t-2 generation;
based on object modelAnd obtaining the sub-model generated based on the training data in each training batch when t generation is obtained(ii) a Wherein k represents a training batch;
according to sub-modelAnd object modelDistance between them determining submodelAnd object modelFusing weights of fused target models, and sub-models are fused according to the target model fusion weightsWith the object modelFusing to obtain a fusion model;
2. The method of claim 1, wherein the dependency submodelWith the object modelDistance between them determining submodelWith the object modelThe fused target model fusion weights include:
according to sub-modelWith the object modelGenerating candidate model fusion weight according to the Manhattan distance between the two models; wherein the candidate model fusion weight is represented by:
wherein the content of the first and second substances,representsAndthe manhattan distance between the two electrodes,representsAndthe manhattan distance between them, M represents the candidate model fusion weight;
converting each line in the candidate model fusion weight to obtain a target model fusion weight; the transition is used to control the sum of the weight parameters in each row to approach a set value.
3. The method of claim 2, wherein the converting each row of the candidate model fusion weights comprises:
and converting each line in the candidate model fusion weight by using a normalized exponential softmax function so that the sum of weight parameters in each line after conversion is close to 1.
4. The method according to claim 2 or 3, wherein the target model fusion weight is represented by the following formula:
wherein, the first and the second end of the pipe are connected with each other,is formed byThe light-emitting diode is obtained by conversion,is through pairThe light-emitting diode is obtained by conversion,representing target model fusion weights;
the sub-model is fused according to the target model fusion weightWith the object modelThe fusion model obtained by the fusion comprises the following steps:
selecting elements of a last row from the target model fusion weights;
5. The method of claim 1, wherein the fusing the model with the target modelCombining to obtain a t-generation trained target modelThe method comprises the following steps:
the fusion model and the target model are combined according to the following formulaCarrying out combination:
6. The method of claim 1, wherein the sub-models are generated based on training data in each training batchIs evaluated by a constructed loss function; the loss function will be described as being based on sub-modelsAnd object modelA regularization function of the Manhattan distance and a Hamilton energy loss function; the Hamiltonian energy loss function is used for helping the trained model to approach the theoretical optimal model from the energy perspective.
7. The method of claim 6, wherein the loss function is expressed by the following equation:
wherein, the first and the second end of the pipe are connected with each other,representing the hamiltonian energy loss function when the model is trained to the t generation,the label value is used for helping the predicted value to approach the theoretical optimal label value from the energy angle;to representAndthe Manhattan distance between the two, a and b are [1, n +1 ]], = 。
8. The method of claim 6 or 7, wherein the hamiltonian energy loss function is represented by the following equation:
wherein, the first and the second end of the pipe are connected with each other,representing the hamiltonian energy loss function,is a variable of the first order and is,the parameters of the model are represented by,representing output values with respect to a modelAnd model parametersCo-optimized function, model output valueIs the output value of the model when the model is trained in the kth training batch in the t generation.
9. The method of claim 8, wherein the step of measuring is performed by a computerExpressed by a proportional control-integral control-derivative control PID equation, theRepresented by the formula:
where lr represents an initial learning rate,representing the loss gradient of the model training to the t generation,the scaling parameters are represented by a scale parameter,representing the output value of the model when the model is trained to the kth training batch when the model is trained to the t generationWith corresponding tag valueThe cross entropy of (2) is used for measuring the accuracy of model training;
the representation is based onDeterminingA learning rate of time, saidRepresenting the model parameters when the model was trained to the t +1 generation,representing model parameters when the model is trained to t generation;
the representation is based onDeterminingA learning rate of time, saidRepresenting the scaling parameters when the model was trained to the t +1 generation,representing the scaling parameters when the model was trained to the t generation,representing the hamiltonian energy loss function when the model is trained to t generations.
10. An adaptive learning rate collaborative optimization target identification device, the device comprising:
an obtaining unit, configured to obtain the target model obtained by training the t-1 generation when the model is trained to the t generation(ii) a The object modelIs a sub-model generated based on training data in each training batch in t-1 generationAnd a reference modelDetermining a fusion model obtained by fusion; when the value t is greater than 2, the value,a target model obtained by training in the t-2 generation; and, based on the object modelAnd obtaining the sub-model generated based on the training data in each training batch when t generation is obtained(ii) a Wherein k represents a training batch;
a fusion unit for determining sub-modelsWith the object modelDistance between them determining submodelAnd object modelFusing weights of fused target models, and sub-models are fused according to the target model fusion weightsWith the object modelFusing to obtain a fusion model;
11. An electronic device, comprising: a processor and a machine-readable storage medium;
the machine-readable storage medium stores machine-executable instructions executable by the processor;
the processor is operable to execute machine executable instructions to perform the method steps of any of claims 1-9.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210499300.3A CN114612750B (en) | 2022-05-09 | 2022-05-09 | Target identification method and device for adaptive learning rate collaborative optimization and electronic equipment |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210499300.3A CN114612750B (en) | 2022-05-09 | 2022-05-09 | Target identification method and device for adaptive learning rate collaborative optimization and electronic equipment |
Publications (2)
Publication Number | Publication Date |
---|---|
CN114612750A CN114612750A (en) | 2022-06-10 |
CN114612750B true CN114612750B (en) | 2022-08-19 |
Family
ID=81869151
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210499300.3A Active CN114612750B (en) | 2022-05-09 | 2022-05-09 | Target identification method and device for adaptive learning rate collaborative optimization and electronic equipment |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114612750B (en) |
Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114049539A (en) * | 2022-01-10 | 2022-02-15 | 杭州海康威视数字技术股份有限公司 | Collaborative target identification method, system and device based on decorrelation binary network |
Family Cites Families (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109841220B (en) * | 2017-11-24 | 2022-09-13 | 深圳市腾讯计算机系统有限公司 | Speech signal processing model training method and device, electronic equipment and storage medium |
CN110557829B (en) * | 2019-09-17 | 2020-12-11 | 北京东方国信科技股份有限公司 | Positioning method and positioning device for fusing fingerprint database |
CN112861984B (en) * | 2021-02-25 | 2022-07-01 | 西华大学 | Speech emotion classification method based on feature fusion and ensemble learning |
CN113222014A (en) * | 2021-05-12 | 2021-08-06 | 深圳思谋信息科技有限公司 | Image classification model training method and device, computer equipment and storage medium |
CN113312445B (en) * | 2021-07-29 | 2022-02-11 | 阿里云计算有限公司 | Data processing method, model construction method, classification method and computing equipment |
CN114332550A (en) * | 2021-08-19 | 2022-04-12 | 腾讯科技(深圳)有限公司 | Model training method, system, storage medium and terminal equipment |
CN114047691A (en) * | 2022-01-10 | 2022-02-15 | 杭州海康威视数字技术股份有限公司 | Target identification method and device for adaptive learning rate collaborative optimization and electronic equipment |
-
2022
- 2022-05-09 CN CN202210499300.3A patent/CN114612750B/en active Active
Patent Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114049539A (en) * | 2022-01-10 | 2022-02-15 | 杭州海康威视数字技术股份有限公司 | Collaborative target identification method, system and device based on decorrelation binary network |
Also Published As
Publication number | Publication date |
---|---|
CN114612750A (en) | 2022-06-10 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US20210256403A1 (en) | Recommendation method and apparatus | |
EP3711000B1 (en) | Regularized neural network architecture search | |
Gemici et al. | Generative temporal models with memory | |
WO2019165462A1 (en) | Unsupervised neural network training using learned optimizers | |
KR102203252B1 (en) | Method and system for collaborative filtering based on generative adversarial networks | |
CN110770760A (en) | Object-level prediction of future states of a physical system | |
CN114021524B (en) | Emotion recognition method, device, equipment and readable storage medium | |
CN112906888B (en) | Task execution method and device, electronic equipment and storage medium | |
Li et al. | Energy-based models for continual learning | |
US20220366257A1 (en) | Small and Fast Video Processing Networks via Neural Architecture Search | |
CN113128432B (en) | Machine vision multitask neural network architecture searching method based on evolution calculation | |
CN114091597A (en) | Countermeasure training method, device and equipment based on adaptive group sample disturbance constraint | |
CN114612750B (en) | Target identification method and device for adaptive learning rate collaborative optimization and electronic equipment | |
CN114137967B (en) | Driving behavior decision method based on multi-network joint learning | |
JP7438544B2 (en) | Neural network processing device, computer program, neural network manufacturing method, neural network data manufacturing method, neural network utilization device, and neural network downsizing method | |
CN110659962A (en) | Commodity information output method and related device | |
Sharma et al. | A generalized novel image forgery detection method using generative adversarial network | |
Nylend | Data efficient deep reinforcement learning through model-based intrinsic motivation | |
CN116401377B (en) | Time sequence knowledge graph reasoning method based on diffusion probability distribution | |
Annabestani et al. | A new soft computing method for integration of expert's knowledge in reinforcement learn-ing problems | |
US20230029590A1 (en) | Evaluating output sequences using an auto-regressive language model neural network | |
US20230336823A1 (en) | Real-Time Adaptive Content Generation with Dynamic Sentiment Prediction | |
CN115187845A (en) | Image processing method, device and equipment | |
CN116911373A (en) | Training method and device for reinforcement learning model | |
CN114219099A (en) | Symbolic option and action model self-learning-based task planning method |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |