CN114612750B

CN114612750B - Target identification method and device for adaptive learning rate collaborative optimization and electronic equipment

Info

Publication number: CN114612750B
Application number: CN202210499300.3A
Authority: CN
Inventors: 张宝昌; 王滨; 王润琪; 吕金虎; 王田; 王星; 张峰
Original assignee: Hangzhou Hikvision Digital Technology Co Ltd
Current assignee: Hangzhou Hikvision Digital Technology Co Ltd
Priority date: 2022-05-09
Filing date: 2022-05-09
Publication date: 2022-08-19
Anticipated expiration: 2042-05-09
Also published as: CN114612750A

Abstract

The embodiment of the application provides a target identification method and device for adaptive learning rate collaborative optimization and electronic equipment. In this embodiment, the model trained in each training batch is adaptively fused with the previously trained model according to the difference between the model trained in each training batch and the previously trained model, so that forgetting of the model on previous knowledge can be reduced, loss oscillation in the neural network training process is avoided, and smooth convergence of the model is accelerated.

Description

Target identification method and device for adaptive learning rate collaborative optimization and electronic equipment

Technical Field

The present application relates to the field of artificial intelligence, and in particular, to a target identification method and apparatus for adaptive learning rate collaborative optimization, and an electronic device.

Background

Deep learning has demonstrated excellent performance in various different areas of image, speech, natural language processing, and the like. Taking face recognition as an example, the recognition rate and recognition speed of deep learning already exceed the speed of manual recognition. In addition, word recognition, text translation, and automated driving, which is now heavily studied, all benefit from the development of deep learning.

At present, although a reverse feedback (BP) algorithm with a good effect is adopted to calculate a gradient in a neural network model training process based on deep learning, oscillation loss still exists, such as slow training speed and incapability of converging the model. The loss shock can affect the training of the neural network model, and further affect the application of the neural network model, such as inaccuracy of recognized target objects (such as faces) and/or abnormal action behaviors (such as fire, robbery, fighting and stealing).

Disclosure of Invention

The embodiment of the application provides a target identification method and device for adaptive collaborative optimization and electronic equipment, so that loss oscillation in the training process of a neural network model is avoided by adaptively fusing a trained historical model, and the accuracy of target identification is improved.

The embodiment of the application provides a target identification method for adaptive collaborative optimization, which comprises the following steps:

under the condition that the model is trained to t generation, obtaining a target model obtained by training in t-1 generation

(ii) a The object model

Is a sub-model generated based on training data in each training batch in the t-1 generation

And a reference model

Determining a fusion model obtained by fusion; when the value t is greater than 2, the value,

is a target model obtained by training in the t-2 generation;

based on a target model

And obtaining the sub-model generated based on the training data in each training batch when t generation is obtained

(ii) a Wherein k represents a training batch;

according to sub-model

With the object model

Distance between them determining submodel

With the object model

Fusing weights of the fused target models, and combining the sub-models according to the target model fusion weights

With the object model

Fusing to obtain a fusion model;

combining the fusion model with the target model

Combining to obtain a target model trained in the t generation

To pass the trained target model

And carrying out target identification.

The embodiment of the application provides a target identification device for adaptive collaborative optimization, which comprises:

an obtaining unit, configured to obtain the target model obtained through training in the t-1 generation when the model is trained to the t generation

(ii) a The object model

And a reference model

Determining a fusion model obtained by fusion; when t is greater than 2, the process is repeated,

is a target model obtained by training in the t-2 generation; and, based on the object model

(ii) a Wherein k represents a training batch;

a fusion unit for being dependent on the sub-model

With the object model

Distance between them determining submodel

With the object model

Fusing weights of fused target models, and sub-models are fused according to the target model fusion weights

With the object model

Fusing to obtain a fusion model;

a determination unit for combining the fusion model with the target model

Combining to obtain a target model trained in the t generation

To pass the trained target model

And carrying out target identification.

The embodiment of the application also provides the electronic equipment. The electronic device includes: a processor and a machine-readable storage medium;

the machine-readable storage medium stores machine-executable instructions executable by the processor;

the processor is configured to execute machine-executable instructions to implement the steps of the above-disclosed method.

According to the technical scheme, in the embodiment of the application, in the model training process, the sub-model trained in each training batch is adaptively fused with the target model trained in the previous generation according to the difference between the sub-model trained in each training batch and the target model trained in the previous generation, and the target model trained to the current generation is determined according to the finally fused model, so that the target model trained in any generation can inherit the information of the target model trained in the previous generation and the sub-model trained in each training batch, the forgetting of the previous knowledge by the target model trained at present is reduced, the loss oscillation in the neural network model training process is avoided, and the accuracy of target identification is improved;

further, in the embodiment, by adaptively fusing the model trained in each training batch with the target model trained in the previous generation, after iteration is performed for a sufficient number of times, it is finally found that the difference between the sub-model trained in the training batch and the target model trained in the previous generation is very small and basically tends to be consistent, so that the problem of extreme learning rate is avoided, and the generalization performance of the method provided by the embodiment of the present application is ensured.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present disclosure and together with the description, serve to explain the principles of the disclosure.

FIG. 1 is a flow chart of a method provided by an embodiment of the present application;

FIG. 2 is a block diagram of an apparatus according to an embodiment of the present disclosure;

fig. 3 is a structural diagram of an electronic device according to an embodiment of the present application.

Detailed Description

Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The embodiments described in the following exemplary embodiments do not represent all embodiments consistent with the present application. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the present application, as detailed in the appended claims.

The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the application. As used in this application and the appended claims, the singular forms "a", "an", and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise.

In order to make the technical solutions provided in the embodiments of the present application better understood and make the above objects, features and advantages of the embodiments of the present application more comprehensible, the technical solutions in the embodiments of the present application are described in further detail below with reference to the accompanying drawings.

In the conventional model training at present, no matter which generation the model is trained to, the model is directly updated on the newly trained model at present. For example, when the model is trained to t generation, the model trained in t-1 generation (i.e. the target model trained in t-1 generation) is directly updated, and when the model is trained to t +1 generation, the model trained in t generation (i.e. the target model trained in t generation) is directly updated, and so on.

However, it has been found that such direct updating of the currently newly trained model during the model training iteration is ideal. In the training process of the current model, due to the inverted shooting inhibition, catastrophic forgetfulness to a previously trained model, such as a model characteristic parameter, often occurs, so that a later trained model and a previously trained model compete with each other, and the previous learning is interfered.

In order to solve the above technical problems, this embodiment breaks through the conventional model training manner, and adaptively fuses the model trained in each training batch with the previously trained model according to the difference between the model trained in each training batch and the previously trained model during model training, so as to avoid the above feature parameter forgetting problem, accelerate model convergence, avoid loss oscillation during neural network model training, and further improve the accuracy of target identification.

The method provided by the embodiments of the present application is described below:

referring to fig. 1, fig. 1 is a flowchart of a method provided in an embodiment of the present application. The process is applied to the electronic equipment. Optionally, as an embodiment, the electronic device may be an internet of things terminal device such as a video terminal and a door access device, which is applied to the neural network model. As another embodiment, the electronic device may also be a backend device such as a server, and the embodiment is not limited in particular.

As shown in fig. 1, the process may include the following steps:

and 101, under the condition that the model is trained to t generation, obtaining a target model obtained by training in t-1 generation.

For convenience of description, in this embodiment, the target model obtained by training in the t-1 generation is denoted as

。

In the present embodiment, the object model

And a reference model

And determining a fusion model obtained by fusion. When the value t is greater than 2, the value,

the target model obtained by training in the t-2 generation. Of course, when t =2,

is the initial model at the beginning of the entire model training. In this embodiment, the initial model is not particularly limited, and may be, for example, a fully-connected network, a convolutional neural network, a cyclic neural network, or the like. As for the network structure, the embodiment is not limited, for example, taking a fully connected network as an example, the model may include: the device comprises an input layer, a first hidden layer, a second hidden layer, a third hidden layer, a fourth hidden layer, a fifth hidden layer and an output layer.

As for the object model

The training mode is similar to how the target model of t generation is trained (noted as

). For the moment, this description will not be repeated.

And 102, acquiring a sub-model generated based on training data in each training batch when the generation t is based on the target model obtained by training the generation t-1.

In application, training data used for model training is relatively large, and a whole set of training data is often divided into a plurality of batches for training. For example, there are 50000 training data, and 50000 training data can be divided into 10 training batches, each training batch containing 5000 training data.

It should be noted that, this embodiment does not specifically limit the training data in each training batch, and may determine the training data according to a target to be recognized, for example, the recognition target is a human face, a verification code reading, and the like, and may use feature vector data obtained by processing corresponding picture data as the training data, and the like, which is not specifically limited in this embodiment.

In this embodiment, for the first training batch, the training data of the first training batch is used to directly train the target model obtained in the t-1 generation

Performing model updating to obtain

For the second training batch, directly pairing the training data in the second training batch

Updating the model to obtain

Repeating the steps until the last training batch n is obtained, and obtaining n-1 training batches by using the training data in the nth training batch

Updating the model to obtain

. Finally, a sub-model generated based on training data in each training batch when t generation is obtainedModel (III)

And k represents a training batch.

Represented by the formula:

}。

and 103, determining a target model fusion weight for fusion of the sub-model and the target model obtained by training in the t-1 generation according to the distance between the sub-model generated based on the training data in each training batch and the target model obtained by training in the t-1 generation during the t generation, and fusing the sub-model and the target model obtained by training in the t-1 generation according to the target model fusion weight to obtain a fusion model.

Optionally, in this embodiment, the sub-model generated based on the training data in each training batch at the time of generation t can be used

And

the Manhattan distance between the two models is used as an adaptive weight parameter for weighted fusion of the models so as to realize that the finally obtained fusion model completes the deduction from an individual to the whole.

Based on the above description, follow the sub-model

With the object model

The manhattan distance between them, a candidate model fusion weight is generated as represented by:

wherein the content of the first and second substances,

represent

And

the manhattan distance between the two electrodes,

represents

And

manhattan distance between them, M represents the candidate model fusion weight.

After the candidate model fusion weight is obtained, as one embodiment, the candidate model fusion weight may be taken as the target model fusion weight. As another embodiment, in order to ensure that the sum of the weight parameters in each row is close to a set value, such as 1, each row in the candidate model fusion weights may be transformed to obtain the target model fusion weight. Wherein the transition is used to control the sum of the weight parameters in each row to approach a set value. In a specific implementation, for example, the conversion may be implemented by a normalized exponential function (softmax), and the like, and the embodiment is not particularly limited.

Corresponding to the candidate model fusion weight, the final target model fusion weight in this embodiment can be represented by the following formula:

wherein the content of the first and second substances,

is through pair

The light-emitting diode is obtained by conversion,

is through pair

The light-emitting diode is obtained by conversion,

representing the target model fusion weights.

After the target model fusion weight is obtained, as described in step 103, the sub-models can be combined according to the target model fusion weight

With the object model

And fusing to obtain a fusion model.

As one embodiment, sub-models are fused with weights according to the target model

And object model

When merging, one row of elements may be selected from the target model fusion weights first, and based on the target model fusion weights described above, the last row of elements in the target model fusion weights may be selected (because the elements embody the target model trained with t-1 generation)

Manhattan distance between). Thereafter, the selected elements are used as sub-models

With the object model

The fused weight, based on the object model fusion weight described above, may be, for example, the first element of the last line of the selection as a sub-model

And object model

The fused weight can use the second element of the last line of the selection as a sub-model

And object model

The fused weight is analogized in turn. Then, based on the sub-model

With the object model

Fused weights, sub-models

With the object model

And fusing to obtain a fused model.

In this embodiment, if the last row of elements in the fusion weight of the target model is selected, the fusion model can be represented by the following formula:

wherein the content of the first and second substances,

representing the fusion model, k represents the training batch, when k = n +1,

=

。

it should be noted that, in this embodiment, the last row of elements in the target model fusion weight is selected because all the last row of elements exhibit the target model trained in the t-1 generation

The Manhattan distance between the two sub-models ensures the sub-model trained currently

(i.e., the task-specific model) can absorb as much of the base model as possible

Knowledge of (a). In the present embodiment of the present invention,

the method can be used as an adaptive parameter in a model learning process, and helps the model to complete optimization more robustly.

And 104, combining the fusion model with the target model obtained by training in the t-1 generation to obtain a target model trained in the t generation, and performing target identification through the target model trained in the t generation.

In this embodiment, the target model trained in the t generation can be written as

。

In the embodiment, the use of the fusion model is introduced when the fusion model is combined with the target model obtained by training in the t-1 generationOver-parameters in controlling learning speed

. For example, the fusion model is combined with the target model obtained by training in the t-1 generation according to the following formula:

here, the number of the first and second electrodes,

it may be preset, such as to 0.1. The present embodiment is not particularly limited.

Due to the above candidate fusion weight

Or the object model fusion weights regularization, such that after a sufficient number of iterations, the task-specific model type

And a basic model

The difference between them is gradually reduced, and the task is specific to the model type

And basic model

The weights of (a) and (b) tend to be consistent. When all models are task-specific models

And a basic model

Ideally optimized to the same model, which share the same knowledge, thereby eliminating task specific modelsInformation is lost in type fusion. By the model fusion method, the forgetting of the model to the previous knowledge can be reduced, so that loss oscillation in the neural network training process is avoided, and the smooth convergence of the model is accelerated.

In addition, the target recognition mentioned in step 104 may be used for target recognition of pictures or videos. For example, for face recognition, step 104 may be implemented by training the target model

Carrying out face recognition on the face picture; as another example, when applied to the recognition of objects, such as vehicles, the step 104 may be implemented by training the object model

Performing target correspondence such as vehicle recognition on a target object picture such as a vehicle picture; for example, the method can be applied to abnormal behavior recognition (such as robbery, fire, fighting, etc.), and the step 104 can be implemented by trained target models

The video image is subjected to abnormal behavior identification, and this embodiment is not particularly limited.

In this embodiment, the trained object model is used as the target model

Also updated on demand, e.g. in the target model

When the number of the false alarms reaches a preset threshold value, automatically updating; as another example, manual updates and the like are performed based on external triggers such as maintenance personnel and the like. The present embodiment is not particularly limited. As for the updating manner, the flow shown in fig. 1 can be returned.

The flow shown in fig. 1 is thus completed.

As can be seen from the flow shown in fig. 1, in the present embodiment, in the model training process, the model trained in each training batch is adaptively fused with the previously trained model according to the difference between the model trained in each training batch and the previously trained model, so that the forgetting of the model to the previous knowledge can be reduced, thereby avoiding the loss oscillation in the neural network training process and accelerating the smooth convergence of the model.

The sub-model generated for the training data in each training batch is as follows

The description is that:

in the present embodiment, the sub-models are generated based on the training data in each training batch

Is evaluated by the constructed loss function. Here, the loss function will be described as being based on a sub-model

With the object model

The regularization function of the Manhattan distance and the Hamilton energy loss function are two parts; the Hamiltonian energy loss function is used to help the trained model approach the theoretically optimal model from an energy perspective.

Alternatively, in this embodiment, the loss function can be represented by the following formula:

wherein, the first and the second end of the pipe are connected with each other,

representing the hamiltonian energy loss function,

for assisting in predicting values fromThe energy angle approaches the theoretical optimal label value;

to represent

And

the Manhattan distance between the two, a and b are [1, n +1 ]]，

=

。

The following describes a hamiltonian energy loss function related to an embodiment of the present application:

in model training, the following problems often occur: on one hand, training data in different training batches have different distributions, and the different distributions bring difficulty to the training of the whole network; on the other hand, after the training data in the training batch is calculated by a layer-by-layer network, the data distribution of the training data also changes (this phenomenon is called data distribution drift). In order to solve the above problem, the model trained in this embodiment at least includes a batch normalization (batch norm) layer.

The batch normalization layer performs corresponding operations through the following formula:

wherein the content of the first and second substances,

indicates the batch size (batch size),

in order to be an input, the user can select,

is the average value of the batches,

is the variance of the batch,

is a minimum value to prevent points from being scored

It is not meaningful to have the information that,

and

as a trainable parameter of the batch specification layer,

is the output.

In the model, operations also need to be performed on the output of the batch specification layer, model parameters, and the like. Taking the convolution operation performed on the output of the batch specification layer and the model parameter as an example, the convolution operation performed on the output of the batch specification layer and the model parameter is expressed as follows:

. Wherein the content of the first and second substances,

the parameters of the model are represented by,

representing the output of the batch specification layer. In application, operations are performed on the output of the batch specification layer and model parameters, etc., which constitute parameters in the physical structure

The parameter coupling problem exists in the model training process.

For the parameter coupling problem existing in the model training process, if the same gradient is reduced on the parameter, potential energy loss is caused. In order to ensure the minimum energy loss in the model training process and maintain the overall performance of the model, a hamilton energy loss function is introduced in the embodiment. Coupling relations or cooperation of different parameters are considered through a Hamilton energy loss function, the gradient is ensured to descend along the terrain constrained by the different parameters on the premise of following the minimum energy loss principle, and the coupling relations among the different parameters are fully considered.

Alternatively, in the present embodiment, the hamiltonian energy loss function can be represented by the following formula:

wherein the content of the first and second substances,

representing the hamiltonian energy loss function,

is a variable of the first order and is,

the parameters of the model are represented by,

representing output values with respect to a model

And model parameters

A co-optimized function.

As an example, in particular implementations,

can be proportional controlSystem-integral-control-differential-control (PID) equation representation, e.g.

Comprises the following steps:

where lr represents an initial learning rate,

representing the loss gradient of the model training to the t generation,

representing a scaling parameter.

Representing the output value of the model during the training of the model in the ith training batch

With corresponding tag value

The cross entropy of (2) is used for measuring the accuracy of model training.

Optionally, in the present embodiment, can be directed to

The following settings were made:

to make

And as an auxiliary variable of Legendre change, converting a cross entropy equation into a coordinate system, decoupling coupled parameters, and reducing the energy loss in the model training process to the minimum to ensure the overall performance of the model.

As an example, when addressing

When the above-described setting is performed, correspondingly, in the present embodiment,

、

the update rules of (1) are respectively:

；

；

wherein the content of the first and second substances,

the representation is based on

Determining

A learning rate of time, said

Representing the model parameters when the model was trained to the t +1 generation,

representing model parameters when the model is trained to t generation;

the representation is based on

Determining

A learning rate of time, said

Representing the scaling parameters when the model was trained to the t +1 generation,

representing the scaling parameters when the model was trained to the t generation,

representing the hamiltonian energy loss function when the model is trained to the t generations.

Finally, with the loss function, when model training is performed based on the training data in each training batch, if the loss function calculated based on the output value (also referred to as predicted value) and the label value (also referred to as true value) of the current model is found to be the minimum, the model training of the training batch is ended, and the model at that time is marked as the sub-model generated by the training batch

。

In the early stage of training, the output value of the loss function is changed greatly, namely the loss gradient is large, and correspondingly the learning step length is large, so that the model can be allowed to change in a large range, and the optimization of the weight parameters is accelerated. When training tends to be stable, the output value change of the loss function becomes small, namely the loss gradient is small, the learning step length tends to be fixed, the problem of extreme learning rate is avoided, the generalization performance of the optimizer algorithm is ensured, and the advantages of the optimizer algorithm are exerted.

Further, in the present embodiment, by

As an auxiliary variable of Legendre change, a cross entropy equation is converted into a coordinate system, coupled parameters are decoupled, and a model is trainedThe energy loss in the process is reduced to the minimum, the model is updated more effectively, and the overall performance of the model is ensured.

For ease of understanding, the present embodiment uses a CIFAR dataset to test a network trained by the existing method and a network trained by the method provided in the embodiments of the present application.

In the present embodiment, CIFAR data sets are divided into CIFAR-10 and CIFAR-100 data sets. The CIFAR-10 dataset consists of 10 classes, each of which has 60000 32x32 color images (of which there are 50,000 training images and 10,000 test images). The CIFAR-100 dataset consists of 100 classes, each containing 600 32x32 images (of which there are 500 training images and 100 test images).

The present embodiment uses the above CIFAR dataset for each of the following networks: ResNet18, ResNet34, Vgg19, PreActResNet18, DenseNet121, GoodLeNet test model training according to a conventional method (taking Adam as an example), and test model training according to the method provided by the embodiment of the application, and Table 1 shows the average accuracy of the test and the epoch which reaches the highest accuracy for the first time:

as can be seen from table 1, the method provided in the embodiment of the present application has higher accuracy on each network, and the training convergence is improved by 9% on average. Therefore, compared with the Adam optimizer algorithm, the method provided by the embodiment of the application can be proved to have the effects of reducing overshoot and accelerating parameter convergence on the premise of ensuring the accuracy.

The method provided by the embodiment of the application is described above. The following describes the apparatus provided in the embodiments of the present application:

referring to fig. 2, fig. 2 is a structural diagram of an apparatus provided in an embodiment of the present application. As shown in fig. 2, the apparatus may include:

an obtaining unit, configured to obtain the target model obtained by training the t-1 generation when the model is trained to the t generation

(ii) a The object model

And a reference model

(ii) a Wherein k represents a training batch;

a fusion unit for determining sub-models

And object model

Distance between them determining submodel

With the object model

With the object model

Fusing to obtain a fusion model;

a determination unit for combining the fusion model with the target model

Combining to obtain a target model trained in the t generation

To pass the trained target model

And carrying out target identification.

Optionally, the dependency sub-model

And object model

Distance between them determining submodel

With the object model

The fused target model fusion weights include:

according to sub-model

And object model

Generating candidate model fusion weight according to the Manhattan distance; wherein the candidate model fusion weight is given byRepresents:

wherein the content of the first and second substances,

represents

And

the manhattan distance between them,

represents

And

m represents the candidate model fusion weight;

converting each line in the candidate model fusion weight to obtain a target model fusion weight; the transition is used to control the sum of the weight parameters in each row to approach a set value.

Optionally, the converting each row of the candidate model fusion weights comprises:

each row in the candidate model fusion weight is transformed by using a normalized index softmax function, so that the sum of weight parameters in each transformed row is close to 1.

Optionally, the target model fusion weight is represented by:

wherein the content of the first and second substances,

is formed by

The light-emitting diode is obtained by conversion,

is formed by

The light-emitting diode is obtained by conversion,

representing target model fusion weights;

the sub-model is fused according to the target model fusion weight

With the object model

The fusion model obtained by the fusion comprises the following steps:

selecting elements of a last line from the target model fusion weights;

using the selected elements as sub-models

With the object model

Fused weights, sub-models

And object model

And fusing to obtain a fusion model.

Optionally, the fusing model and the target model are combined

Combining to obtain a t-generation trained target model

The method comprises the following steps:

the fusion model and the target model are combined according to the following formula

Carrying out combination:

wherein the content of the first and second substances,

for the defined hyper-parameters for controlling the learning speed,

representing the fusion model.

Optionally, a sub-model generated based on training data in each training batch

Is evaluated by a constructed loss function; the loss function will be described as being based on sub-models

With the object model

The regularization function of the Manhattan distance and the Hamilton energy loss function are two parts; the Hamiltonian energy loss function is used for helping the trained model to approach the theoretical optimal model from the energy perspective.

Optionally, the loss function is represented by the following formula:

representing the function of the hamiltonian energy loss,

the label value is used for helping the predicted value to approach the theoretical optimal label value from the energy angle;

to represent

And

the Manhattan distance between the two, a and b are [1, n +1 ]]，

=

。

Optionally, the hamiltonian energy loss function is represented by:

；

wherein the content of the first and second substances,

representing the hamiltonian energy loss function,

is a variable of the first order and is,

the parameters of the model are represented by a number,

representing output values with respect to a model

And model parameters

A co-optimized function.

Optionally, the

Expressed by a proportional control-integral control-derivative control PID equation, the

Represented by the formula:

wherein, lr represents an initial learning rate,

representing the loss gradient of the model training to the t generation,

a scaling parameter is represented that is representative of,

With corresponding tag value

Cross entropy of (1) for measuring accuracy of model training；

Wherein, when setting as follows for p:

then, then

、

The update rules of (a) are respectively:

；

；

representation is based on

Determining

A learning rate of time, said

representing model parameters when the model is trained to t generation;

representation is based on

Determining

A learning rate of time, said

Thus, the description of the device structure shown in fig. 2 is completed.

The embodiment of the application also provides a hardware structure of the device shown in fig. 2. Referring to fig. 3, fig. 3 is a structural diagram of an electronic device according to an embodiment of the present disclosure. As shown in fig. 3, the hardware structure may include: a processor and a machine-readable storage medium having stored thereon machine-executable instructions executable by the processor; the processor is configured to execute machine executable instructions to implement the methods disclosed in the above examples of the present application.

Based on the same application concept as the method, embodiments of the present application further provide a machine-readable storage medium, where several computer instructions are stored, and when the computer instructions are executed by a processor, the method disclosed in the above example of the present application can be implemented.

The machine-readable storage medium may be, for example, any electronic, magnetic, optical, or other physical storage device that can contain or store information such as executable instructions, data, and the like. For example, the machine-readable storage medium may be: RAM (random Access Memory), volatile Memory, non-volatile Memory, flash Memory, a storage drive (e.g., a hard drive), a solid state drive, any type of storage disk (e.g., an optical disk, a dvd, etc.), or similar storage medium, or a combination thereof.

The systems, apparatuses, modules or units described in the above embodiments may be specifically implemented by a computer chip or an entity, or implemented by a product with certain functions. A typical implementation device is a computer, which may be in the form of a personal computer, laptop, cellular telephone, camera phone, smart phone, personal digital assistant, media player, navigation device, email messaging device, game console, tablet computer, wearable device, or a combination of any of these devices.

For convenience of description, the above devices are described as being divided into various units by function, respectively. Of course, the functionality of the units may be implemented in one or more software and/or hardware when implementing the present application.

As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, embodiments of the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

Furthermore, these computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

The above description is only an example of the present application and is not intended to limit the present application. Various modifications and changes may occur to those skilled in the art. Any modification, equivalent replacement, improvement or the like made within the spirit and principle of the present application shall be included in the scope of the claims of the present application.

Claims

1. A target identification method for adaptive learning rate collaborative optimization is characterized by comprising the following steps:

(ii) a The object model

And a reference model

is a target model obtained by training in the t-2 generation;

based on object model

(ii) a Wherein k represents a training batch;

according to sub-model

And object model

Distance between them determining submodel

And object model

With the object model

Fusing to obtain a fusion model;

combining the fusion model with the target model

Combining to obtain a target model trained in the t generation

To pass the trained target model

And carrying out target identification.

2. The method of claim 1, wherein the dependency submodel

With the object model

Distance between them determining submodel

With the object model

The fused target model fusion weights include:

according to sub-model

With the object model

Generating candidate model fusion weight according to the Manhattan distance between the two models; wherein the candidate model fusion weight is represented by:

wherein the content of the first and second substances,

represents

And

the manhattan distance between the two electrodes,

represents

And

the manhattan distance between them, M represents the candidate model fusion weight;

3. The method of claim 2, wherein the converting each row of the candidate model fusion weights comprises:

and converting each line in the candidate model fusion weight by using a normalized exponential softmax function so that the sum of weight parameters in each line after conversion is close to 1.

4. The method according to claim 2 or 3, wherein the target model fusion weight is represented by the following formula:

is formed by

The light-emitting diode is obtained by conversion,

is through pair

The light-emitting diode is obtained by conversion,

representing target model fusion weights;

the sub-model is fused according to the target model fusion weight

With the object model

The fusion model obtained by the fusion comprises the following steps:

selecting elements of a last row from the target model fusion weights;

using the selected elements as sub-models

With the object model

Fused weights, sub-models

And object model

And fusing to obtain a fusion model.

5. The method of claim 1, wherein the fusing the model with the target model

Combining to obtain a t-generation trained target model

The method comprises the following steps:

Carrying out combination:

for the defined hyper-parameters for controlling the learning speed,

representing the fusion model.

6. The method of claim 1, wherein the sub-models are generated based on training data in each training batch

And object model

A regularization function of the Manhattan distance and a Hamilton energy loss function; the Hamiltonian energy loss function is used for helping the trained model to approach the theoretical optimal model from the energy perspective.

7. The method of claim 6, wherein the loss function is expressed by the following equation:

representing the hamiltonian energy loss function when the model is trained to the t generation,

to represent

And

the Manhattan distance between the two, a and b are [1, n +1 ]]，

=

。

8. The method of claim 6 or 7, wherein the hamiltonian energy loss function is represented by the following equation:

；

representing the hamiltonian energy loss function,

is a variable of the first order and is,

the parameters of the model are represented by,

representing output values with respect to a model

And model parameters

Co-optimized function, model output value

Is the output value of the model when the model is trained in the kth training batch in the t generation.

9. The method of claim 8, wherein the step of measuring is performed by a computer

Represented by the formula:

where lr represents an initial learning rate,

representing the loss gradient of the model training to the t generation,

the scaling parameters are represented by a scale parameter,

representing the output value of the model when the model is trained to the kth training batch when the model is trained to the t generation

With corresponding tag value

The cross entropy of (2) is used for measuring the accuracy of model training;

wherein, when setting as follows for p:

then, then

、

The update rules of (1) are respectively:

；

；

the representation is based on

Determining

A learning rate of time, said

representing model parameters when the model is trained to t generation;

the representation is based on

Determining

A learning rate of time, said

representing the hamiltonian energy loss function when the model is trained to t generations.

10. An adaptive learning rate collaborative optimization target identification device, the device comprising:

(ii) a The object model

Is a sub-model generated based on training data in each training batch in t-1 generation

And a reference model

a target model obtained by training in the t-2 generation; and, based on the object model

(ii) a Wherein k represents a training batch;

a fusion unit for determining sub-models

With the object model

Distance between them determining submodel

And object model

With the object model

Fusing to obtain a fusion model;

a determination unit for combining the fusion model with the target model

Combining to obtain a target model trained in the t generation

To pass the trained target model

And carrying out target identification.

11. An electronic device, comprising: a processor and a machine-readable storage medium;

the processor is operable to execute machine executable instructions to perform the method steps of any of claims 1-9.