WO2016062044A1

WO2016062044A1 - Model parameter training method, device and system

Info

Publication number: WO2016062044A1
Application number: PCT/CN2015/076967
Authority: WO
Inventors: 唐胜; 万吉; 柴振华
Original assignee: 华为技术有限公司
Priority date: 2014-10-24
Filing date: 2015-04-20
Publication date: 2016-04-28
Also published as: CN104346629A; CN104346629B

Abstract

A model parameter training method, device and system, which are used for rapidly carrying out parameter training for image retrieval or image classification. The method comprises: using model parameters to carry out iterative computation on an objective function, the objective function being a cost function used for image training; if the result of the iterative computation does not meet the termination condition, determining the first gradient of the objective function on the model parameters, and updating the learning rate according to the parameter distribution characteristics of the model parameters in the objective function; updating the model parameters according to the learning rate and the first gradient; repeating the previous steps till the result of the iterative computation meets the termination condition; and obtaining the model parameter corresponding to the result of the iterative computation that meets the termination condition.

Description

Model parameter training method, device and system

The present application claims priority to Chinese Patent Application No. 201410579249.2, entitled "A Model Parameter Training Method, Apparatus and System", filed on October 24, 2014, the entire contents of which are incorporated herein by reference. In this application.

Technical field

The present invention relates to communication technologies, and in particular, to a model parameter training method, apparatus and system.

Background technique

The traditional method of retrieving images based on keywords has a semantic gap problem, which often causes users to often fail to retrieve the images they want to search. The Content Based Image Retrieval (CBIR) method is a search method that is more similar to human thinking. The current CBIR system relies mainly on some shallow machine learning algorithms, and its performance is greatly limited. Deep Learning is the most eye-catching direction in machine learning in recent years. The motivation lies in the establishment and simulation of a neural network for the analysis of the human brain, which mimics the mechanisms of the human brain to interpret data such as images, sounds and text. The concept of deep learning stems from the research of artificial neural networks, and its basic learning structure is a multi-layer neural network. Unlike the "shallow" layer learning structure of traditional machine learning algorithms, deep learning mimics the "deep" layer learning structure of the human brain through multiple transformations and expression steps. By exploring the deep structure, you can learn from the data to get a gradual abstraction of hierarchical features.

Deep learning has aroused widespread concern in academia and industry, resulting in a series of Deep Neural Network (DNN) models, such as Deep Belief Nets (DBNs), Deep Boltzmann. Deep Boltzmann Machines (DBMs), Convolutional Neural Networks (CNNs), etc.

Studying the efficient deep neural network learning algorithm and realizing the rapid training of massive data is the first problem to be solved in the research and development of deep learning technology. Therefore, the study of learning algorithms for deep neural networks is particularly important.

In the process of image training of the machine, the image retrieval problem to be solved is first abstracted into an optimization problem, the objective function is defined, and then solved by the corresponding optimization algorithm. Define the optimization problems to be solved as follows:

Where w is the model parameter, X is the training data set, and l(w;x) is the cost function. The goal of the solution is to find a set of optimal model parameters w ^* such that the model has the lowest total cost on the training data set. Taking the classification problem as an example, l(w;x) is usually related to the classification error rate, so minimizing the objective function L(w) is equivalent to minimizing the classification error rate.

In particular, in deep learning, L(w) is usually a complex nonlinear function, and often the global optimal solution w ^* cannot be obtained, but only the local optimal solution can be obtained.

The solution to the problem needs to be iterated on the training data. The commonly used methods are stochastic gradient descent method, Newton method and quasi-Newton method.

In the prior art, Stochastic Gradient Descent (SGD) is an optimization method widely used in deep learning. The advantage is that it is easy to implement, fast, and can be used for large-scale training sets.

The basic process of the stochastic gradient descent method is: iteratively calculating the cost function using the initial model parameters, judging whether the result of the iterative calculation satisfies the termination condition, and if not, updating the model parameters according to the preset learning rate and the current gradient value, Continue the iterative calculation until the result of the iterative calculation satisfies the termination condition.

A disadvantage of the prior art stochastic gradient descent method is that manual parameter selection is required, including learning rate, termination conditions, and the like. When the learning rate is set too small, the training process will be very slow; when the learning rate is set too large, it may skip the local optimal solution when updating the model parameters for iterative calculation, so that the convergence speed does not fall, even Causes no convergence.

Summary of the invention

Embodiments of the present invention provide a model parameter training method, apparatus, and system for rapidly performing parameter retrieval of image retrieval or image classification.

The model parameter training method provided by the first aspect of the embodiments of the present invention includes:

The objective function is iteratively calculated using model parameters, which are cost functions for image training,

If the result of the iterative calculation does not satisfy the termination condition,

Determining a first gradient of the objective function on the model parameter, and updating a learning rate according to a parameter distribution characteristic of the model parameter in the objective function,

Updating the model parameters according to the learning rate and the first gradient,

Repeat the above steps until the result of the iterative calculation satisfies the termination condition, and the acquisition is satisfied. The model parameters corresponding to the results of the iterative calculation of the termination condition.

With reference to the first aspect, in a first possible implementation, the updating the learning rate according to the parameter distribution feature that is displayed in the target function according to the model parameter includes:

The learning rate is updated based on a gradient of the objective function over a previous model parameter and the first gradient.

With reference to the first possible implementation manner of the first aspect, in a second possible implementation manner, the step of updating the learning rate according to a gradient of the objective function on a previous model parameter, and the first gradient, include:

Updating a learning rate corresponding to each element in the model parameter, and when processing the jth element in the model parameter, updating the learning rate according to the following formula;

Said

Representing the learning rate corresponding to the jth element of the model parameter in the k+1th model parameter update,

Representing the model parameter change amount corresponding to the jth element of the model parameter corresponding to the k+1th model parameter update,

Representing a first gradient corresponding to the jth element of the model parameter in the k+1th model parameter update,

Representing a gradient corresponding to the kth element parameter update of the previous model parameter, the k being an integer greater than zero, and the j being greater than or equal to an integer of zero.

The image training device provided by the second aspect of the embodiments of the present invention includes:

a calculation unit, a termination determination unit, a gradient determination unit, a rate update unit, and a parameter update unit;

The computing unit is configured to iteratively calculate an objective function using a model parameter, the objective function being a cost function for performing image training;

The termination determining unit is configured to determine whether the result of the iterative calculation satisfies a termination condition, and if not, the execution gradient determining unit and the rate update unit; if yes, acquiring the iterative calculation that satisfies the termination condition Corresponding model parameters;

The gradient determining unit is configured to determine a first gradient of the objective function on the model parameter;

The rate update unit is configured to update a learning rate according to a parameter distribution feature that is displayed in the objective function by the model parameter;

The parameter updating unit is configured to update the model parameter according to the learning rate and the first gradient, and trigger the calculating unit and the termination determining unit.

With reference to the second aspect, in a first possible implementation manner, the rate update unit is specifically configured to:

With reference to the first possible implementation of the second aspect, in a second possible implementation, the rate update unit is specifically configured to:

Said

An image retrieval system provided by the third aspect of the embodiments of the present invention includes:

Image training device, retrieval device and image database;

The image training device includes: a calculation unit, a termination determination unit, a gradient determination unit, a rate update unit, and a parameter update unit; the calculation unit is configured to iteratively calculate an objective function using a model parameter, wherein the objective function is for performing Cost function of image training;

The termination determining unit is configured to determine whether the result of the iterative calculation satisfies a termination condition, and if not, the execution gradient determining unit and the rate update unit; if yes, acquiring the iterative calculation that satisfies the termination condition Corresponding model parameters; the gradient determination unit is used to determine a first gradient of the objective function on the model parameter; the rate update unit is configured to update a learning rate according to a parameter distribution feature exhibited by the model parameter in the objective function; the parameter update unit, And configured to update the model parameter according to the learning rate and the first gradient, and trigger the computing unit and the termination determining unit. The searching device is configured to perform neural network feature extraction on the input image data according to the model parameter determined by the image training device, and perform image retrieval in the image database according to the neural network feature, and output the image retrieval result.

With reference to the third aspect, in a first possible implementation, the rate update unit is specifically configured to:

In conjunction with the first possible implementation of the third aspect, in a second possible implementation, the rate update unit is specifically configured to:

Said

It can be seen from the above technical solutions that the embodiments of the present invention have the following advantages:

In the iterative process of the embodiment of the present invention, if the result of the iterative calculation does not satisfy the termination condition, the iterative calculation is continued, and the parameter distribution represented by the target function according to the model parameter is performed before the next iteration calculation. The feature updates the learning rate, and then uses the learning rate to update the model parameters used in the next iteration calculation, so that the variation range of the model parameters can be adaptively adjusted according to the parameter distribution characteristics of the objective function, thereby local optimization away from the model parameters. When you can, you can pass The learning rate sets the magnitude of the variation of the model parameters to speed up the iterative calculation process. When the local optimal value of the model parameters is approached, the variation range of the model parameters can be set by updating the learning rate, and the iterative calculation is improved. Efficiency, and in turn, improves the speed of image training.

DRAWINGS

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings to be used in the embodiments will be briefly described below. Obviously, the drawings in the following description are only some of the present invention. For the embodiments, those skilled in the art can obtain other drawings according to the drawings without any creative work.

1 is a schematic structural diagram of an image retrieval apparatus in an embodiment of the present invention;

2 is a schematic flow chart of a model parameter training method in an embodiment of the present invention;

3 is another schematic flowchart of a model parameter training method in an embodiment of the present invention;

4 is a schematic diagram of an objective function curve in an embodiment of the present invention;

Figure 5 is another schematic diagram of an objective function curve in an embodiment of the present invention;

6 is another schematic diagram of an objective function curve in an embodiment of the present invention;

7 is another schematic diagram of an objective function curve in an embodiment of the present invention;

8 is a schematic diagram of a convergence test in an embodiment of the present invention;

FIG. 9 is a schematic structural diagram of an image training apparatus according to an embodiment of the present invention; FIG.

FIG. 10 is a schematic diagram showing the computer structure of an image training apparatus based on a model parameter training method according to an embodiment of the present invention.

detailed description

The technical solutions in the embodiments of the present invention are clearly and completely described in the following with reference to the accompanying drawings in the embodiments of the present invention. It is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments obtained by those skilled in the art based on the embodiments of the present invention without creative efforts are within the scope of the present invention.

Referring to FIG. 1 , a model parameter training method in the embodiment of the present invention is applied to the image retrieval system shown in FIG. 1 , specifically:

In practical applications, in order to enable a computer to output human-desired results when searching, a computer device is required for deep learning to establish and simulate a neural network for human brain analysis and learning, which mimics the mechanism of the human brain to interpret data and images. The training device simulates the deep learning structure of the human brain through multiple transformation and expression steps. By exploring the deep structure, it is possible to learn from the data to obtain gradually abstract hierarchical features. Therefore, in order to realize deep learning, it is necessary to provide the image training device 11 in the image retrieval system, perform training of mass data, and determine model parameters for image retrieval.

When the user needs to perform image retrieval, the image data is input in the retrieval device 12 of the image retrieval system, and the retrieval device 12 performs neural network feature extraction on the image data according to the model parameters determined by the image training device 11, and according to the neural network. The feature performs a comparative search of the image in the image database 13, and outputs the result of the image retrieval. Specifically, the result of the image retrieval may be output in descending order according to the similarity of the image.

In the process of image training by the image training device 11, the image retrieval problem to be solved is first abstracted into an optimization problem, the objective function is defined, and then solved by a corresponding optimization algorithm, and the solution target is to find a group of the most Excellent model parameters make the model the least costly on the training data set.

In the prior art, the stochastic gradient descent method can be used to solve the optimal model parameters. However, the speed of image training is not ideal. In the embodiment of the present invention, the model parameter training method is optimized and improved according to the stochastic gradient descent method. Implementation:

In practical applications, when performing iterative calculation of the objective function, it is necessary to set the initial model parameters and the initial learning rate. Specifically, in addition to the first iterative calculation, the model parameters used in the current iterative calculation are the model parameters updated after the previous iteration calculation. For convenience of description, in the embodiment of the present invention, the model parameter used in the current iterative calculation is used as the first model parameter, and the model parameter used in the previous iterative calculation is used as the second model parameter; A gradient on a model parameter is used as a first gradient; a gradient of the objective function on the second model parameter is used as a second gradient.

When the iterative calculation is performed for the first time, the initial model parameters are used as the first model parameters. When the first iteration calculation does not satisfy the termination condition, the initial model parameters are updated for the first time using the initial learning rate, and the updated model parameters are taken as First iteratively calculated first model parameter; embodiment of the invention The model parameter training method is applied to the iterative calculation after the "first update of the initial model parameters".

Referring to FIG. 2, an embodiment of a method for training a model parameter in an embodiment of the present invention includes:

201. Perform an iterative calculation on the objective function using the first model parameter;

The image training device iteratively calculates an objective function using a first model parameter, which is a cost function for performing image training. Exemplarily, taking metric learning as an example, defining w as a convolutional neural network parameter, the input image x is a triple of three pictures, x={q,q ⁺ ,q ^- }, where (q,q ⁺ ) is a similar image pair, (q, q ^- ) is a dissimilar image pair. The mapping of the input image through the neural network is {φ _w (q), φ _w (q ⁺ ), φ _w (q ^- )}, where φ _w (q), φ _w (q ⁺ ), φ _w (q ^- ) All are one-dimensional column vectors, used as image feature representations, then the cost function can be:

l(w,x)=max(0,γ-φ _w (q)·φw(q ⁺ )+φ _w (q)·φ _w (q ⁻ ));

It can be understood that, in practical applications, the cost function may also have other representations, which need to be determined according to actual needs, which is not limited herein.

202. If the result of the iterative calculation does not satisfy the termination condition, determine the first gradient and update the learning rate;

After the image training device iteratively calculates the objective function using the first model parameter, the image training device determines whether the result of the current iterative calculation satisfies the termination condition, and if not, determines the first of the objective function on the model parameter Gradient, and updating the learning rate according to the parameter distribution characteristics exhibited by the model parameter in the objective function; the learning rate is used to determine an update range of the first model parameter.

Specifically, the “parameter distribution feature exhibited by the model parameter in the objective function” may be expressed as a gradient change of a corresponding parameter point on a function image of the objective function.

Specifically, in practical applications, the termination condition may have multiple manifestations, for example, when the first model parameter meets a certain range of values in the settlement result of the objective function, the iterative calculation is terminated; For example, when the number of iteration calculations reaches a certain threshold, the iterative calculation is terminated. It can be understood that the termination condition may have more representations in practical applications, and is not specifically limited herein.

203. Update the first model parameter according to the learning rate and the first gradient.

The image training device updates the first model parameter according to the learning rate and the first gradient, and the specific learning rate may be used to determine an update amplitude of the first model parameter, where the first gradient may be used Determining an update direction of the first model parameter.

After the completion of the update of the first model parameter, step 201 and step 202 are triggered in sequence until the result of the iterative calculation satisfies the termination condition, then the iterative calculation is stopped, and the first model parameter that satisfies the termination condition is acquired.

In the iterative process of the embodiment of the present invention, if the result of the iterative calculation does not satisfy the termination condition, the iterative calculation is continued, and the parameter represented by the model parameter in the objective function is performed before the next iterative calculation. The distribution feature updates the learning rate, and then uses the learning rate to update the model parameters used in the next iteration calculation, so that the variation range of the model parameters can be adaptively adjusted according to the parameter distribution characteristics of the objective function, thereby being far away from the model parameters. When the value is excellent, the variation range of the model parameters can be set by the learning rate to speed up the iterative calculation process. When the local optimum value of the model parameters is approached, the variation range of the model parameters can be set by updating the learning rate. , improve the efficiency of iterative calculations, and thus improve the speed of image training.

In practical applications, the iterative calculation of training data can also use Newton method and quasi-Newton method, but the second-order partial derivative and Hessian matrix need to be calculated in the calculation process, the computational complexity is high, and sometimes the Hessian matrix of the objective function cannot maintain positive definite Thus, the Newton method or the quasi-Newton method is invalidated. The model parameter determination method proposed by the embodiment of the present invention does not require the information of the second derivative and the calculation or approximate calculation of the Hessian matrix, so it is more efficient than the Newton method and the quasi-Newton method, and can be used to solve other unconstrained, constrained or large-scale Nonlinear optimization problem.

The method for determining a model parameter in the embodiment of the present invention is described in detail below. In the embodiment of the present invention, the subscript k indicates that the parameter corresponding to the iterative calculation is currently being performed, and the superscript j indicates that the parameter is the same as the first model parameter. For the parameters corresponding to the j elements, referring to FIG. 3, another embodiment of the method for determining a model parameter in the embodiment of the present invention includes:

301. Perform an iterative calculation on the objective function using the first model parameter;

The image training device iteratively calculates an objective function using a first model parameter, which is a cost function for performing image training.

302. Determine whether the result of the iterative calculation satisfies a termination condition.

After the image training device iteratively calculates the objective function using the first model parameter, the image training device determines whether the result of the current iterative calculation satisfies the termination condition, and if so, stops the iterative calculation, and acquires the first model that satisfies the termination condition. Parameter; if no, step 303 is performed.

Specifically, in practical applications, the termination condition may have multiple manifestations, for example, when the first model parameter satisfies a certain range of values in the settlement result of the objective function, the iterative calculation is terminated; The iterative calculation is terminated when the number of iterative calculations reaches a certain threshold. It can be understood that the termination condition may have more representations in practical applications, and is not specifically limited herein.

303. Determine a first gradient and update a learning rate.

An image training device determines a first gradient of the objective function on the model parameter and updates a learning rate according to a parameter distribution characteristic of the model parameter in the objective function; the learning rate is used to determine The update range of the first model parameter is described.

Specifically, the updating the learning rate according to the parameter distribution feature that is displayed in the objective function according to the model parameter includes:

Exemplarily, the gradient value g _k at the first model parameter w _k in calculating the objective function L(w) may specifically be:

g _k =L'(w _k )

Exemplarily, the learning rate is updated according to the second gradient, the model parameter variation, and the first gradient, specifically:

Updating a learning rate corresponding to each element in the first model parameter, and when processing the jth element in the first model parameter, updating the learning rate according to a pair of the learning rates;

The formula one is:

The learning rate corresponding to the jth element of the model parameter is updated in the k+1th model parameter, and the | represents the jth element of the model parameter in the k+1th model parameter Updating the model parameter change amount corresponding to the first gradient corresponding to the jth element of the model parameter in the k+1th model parameter update, where the previous model parameter is The gradient of the j elements corresponding to the kth model parameter update, the k being an integer greater than zero, and the j being greater than or equal to an integer of zero.

Specifically, for one element in the model parameter, the model parameter change amount is a difference between an element in the first model parameter and an element in a corresponding order or position in the second model parameter, and then takes an absolute value. .

The derivation process of the formula 1 is described in detail below:

In practical applications, the method for updating the first model parameters in the stochastic gradient descent method is:

Formula 2: w _k+1 = w _{k -} η _k g _k ;

By deforming the formula 2, the formula 3 can be obtained, and the model parameter variation Δw _k of the model parameter w is:

Equation 3: Δw _k =w _k+1 -w _k =-η _k g _k ;

Since the change of the Δw is continuous, the learning rate η _{k is} proportional to the absolute value |Δw _k-1 | of the variation of the model parameter calculated in the previous iteration, and the relationship is

Equation 4: η _k =λ _k |w _k -w _k-1 |=λ _k =|Δw _k-1 |;

Wherein λ _k is a proportional parameter between the learning rate and the amount of change of the model parameter.

According to the formula 3 and formula 4, the relationship between Δw _k and λ _k can be obtained:

Equation 5: Δw _k = - λ _k | Δw _k-1 | g _k ;

Further, based on the formula five:

Equation 6: w _k+1 = w _k + Δw _k = w _{k -} λ _k | Δw _k-1 | g _k ;

When it is required to process the jth element in the first model parameter, it is obtained according to the formula six conversion:

Formula seven:

Substituting the formula into the formula five can be obtained:

Formula eight:

Combining Equation 8 and Equation 3, the relationship of the rate η _k can be learned as:

Formula one:

304. Update the first model parameter according to the learning rate and the first gradient.

After the completion of the update of the first model parameter, step 301 and step 302 are triggered in sequence until the result of the iterative calculation satisfies the termination condition, then the iterative calculation is stopped, and the first model parameter that satisfies the termination condition is acquired.

The following is an example of analyzing the validity of the update of the learning rate in the embodiment of the present invention by taking the case of one-dimensional (ie, j=1, multi-dimensional, etc.) as an example:

First of all, when

At the time, the formula 8 of the formula is known:

Formula nine:

When |g _k |=|g _k-1 |, refer to Figure 4, point A is the parameter point corresponding to the k-1th iteration calculation, and point B is the parameter point corresponding to the kth iteration calculation, C The point is a parameter point corresponding to a local optimum value in the objective function. According to the formula nine can be obtained:

This allows the next iteration (k+1th iteration) to fall just in the middle of point A and point B, just adaptively approaching the local optimal parameter point C.

When |g _k |<|g _k-1 |, please refer to Figure 5, point A is the parameter point corresponding to the k-1th iteration calculation, and point B is the parameter point corresponding to the kth iteration calculation, C The point is a parameter point corresponding to a local optimum value in the objective function. According to the formula 9, we can get: |g _k |<|g _k-1 |, which can make the next iteration (k+1th iteration) fall between point A and point B near point B, adaptive The ground is close to the local optimal parameter point C.

When |g _k |>|g _k-1 |, refer to Figure 6, point A is the parameter point corresponding to the k-1th iteration, and point B is the parameter point corresponding to the kth iteration calculation, C The point is a parameter point corresponding to a local optimum value in the objective function. According to the formula nine can be obtained:

This allows the next iteration (k+1th iteration) to fall between point A and point B near point A, adaptively approaching the local optimal parameter point C.

Secondly, when g _k ×g _k-1 >0, it can be known from the formula 8 of the formula:

Formula ten:

Referring to FIG. 7, point A is the parameter point corresponding to the k-1th iteration calculation, point B is the parameter point corresponding to the kth iteration calculation, and point C is the parameter corresponding to a local optimum value in the objective function. point. According to the formula ten, the larger the absolute value of |g _k-1 |-|g _k |, the larger the change of the current gradient and the previous gradient,

The smaller the value, the less the learning rate is adaptively reduced, and vice versa.

In order to verify the effectiveness of the model parameter training method in the embodiment of the present invention, an image retrieval experiment was performed on the Paris data set. The data set has 6,412 images and contains 11 landmarks (landmarks) in Paris. Each of the landmarks selected 5 images for use as a query. The CNNs feature is first extracted on the ImageNet dataset, and then learned and adjusted (model tuning) using the SGD and the method of the present invention on the Paris dataset. Since the model contains about 60 million parameters, neither Newton nor quasi-Newton methods can be used for model training. Therefore, only the method of the present invention and the currently widely used SGD method were compared in the experiment. Compare the convergence speed of SGD and the proposed method in model tuning, and learn after tuning. The average accuracy (mAP) of the model in the image retrieval task.

FIG. 8 is a comparison of the convergence speed of the training of the model parameter training method in the SGD algorithm and the embodiment of the present invention. Since the training uses a randomly selected 3-tuple, the loss function fluctuates greatly, taking the average of the last hundred iterations to smooth the convergence curve. It can be seen that the convergence speed of the model parameter training method in the embodiment of the present invention is significantly faster than the SGD algorithm, and the iterative error of the model parameter training method in the embodiment of the present invention is much lower than that of the SGD, and the iteration is 10000 times. The error has reached the final (100,000 times) convergence error of SGD (0.0125), that is, under the same error termination condition, the model parameter training method in the embodiment of the present invention is increased by 10 times.

The image training device for implementing the model parameter training method in the embodiment of the present invention is described below. It should be noted that the method described in each embodiment of the model parameter training method can be implemented in the image training device of the present invention. Referring to FIG. 9, an embodiment of an image training apparatus in an embodiment of the present invention includes:

Computing unit 901, termination determining unit 902, gradient determining unit 903, rate updating unit 904 and parameter updating unit 905;

The calculating unit 901 is configured to perform an iterative calculation on the objective function using a model parameter, where the objective function is a cost function for performing image training;

The termination determining unit 902 is configured to determine whether the result of the iterative calculation satisfies a termination condition, and if not, the execution gradient determining unit 903 and the rate updating unit 904; if yes, acquiring a location that satisfies the termination condition The model parameters corresponding to the results of the iterative calculation;

The gradient determining unit 903 is configured to determine a first gradient of the objective function on the model parameter;

The rate update unit 904 is configured to update a learning rate according to a parameter distribution feature that is displayed in the target function by the model parameter;

The parameter updating unit 905 is configured to update the according to the learning rate and the first gradient The model parameters are triggered, and the calculation unit 901 and the termination determination unit 902 are triggered.

Further, the rate update unit 904 is specifically configured to:

Updating a learning rate corresponding to each element in the first model parameter during a process of updating a learning rate, and when processing the jth element in the first model parameter, according to a formula Learning rate is updated;

The formula one is:

Said

The following describes the workflow of each unit in the embodiment of the present invention:

The calculation unit 901 performs an iterative calculation on the objective function using the first model parameter, which is a cost function for performing image training.

After iteratively calculating the objective function using the first model parameter, the termination determination unit 902 determines whether the result of the current iteration calculation satisfies the termination condition, and if not, executes the gradient determination unit 903 and the rate update unit 904.

The gradient determining unit 903 determines a first gradient based on the objective function, the first gradient being a gradient of the objective function at the first model parameter. Exemplarily, the gradient value g _k at the first model parameter w _k in calculating the objective function L(w) may specifically be:

g _k =L'(w _k )

The rate update unit 904 updates the learning rate according to the parameter distribution features exhibited by the model parameters in the objective function, the learning rate being used to determine the update magnitude of the first model parameter.

Updating a learning rate corresponding to each element in the first model parameter, and when processing the jth element in the first model parameter, updating according to a pair of the learning rates; the formula One is:

The parameter updating unit 905 updates the first model parameter according to the learning rate and the first gradient, and the specific learning rate may be used to determine an update range of the first model parameter, where the first gradient may be used. And determining an update direction of the first model parameter.

After the completion of the update of the first model parameter, the calculation unit 901 is triggered again, and the iterative calculation is continued on the target function using the updated first model parameter until the result of the iterative calculation satisfies the termination condition, and then stops. The iterative calculation acquires a first model parameter that satisfies the termination condition.

FIG. 10 is a schematic structural diagram of an image training device 20 according to an embodiment of the present invention. The image training device 20 can include an input device 210, an output device 220, a processor 230, and a memory 240.

The image training device 20 provided by the embodiment of the present invention is applied to a stream computing system, where the stream computing system is configured to process and process a service, where the stream computing system includes a master node and a plurality of working nodes; Scheduling each sub-service included in the service to the plurality of work nodes for processing.

Memory 240 can include read only memory and random access memory and provides instructions and data to processor 230. A portion of memory 240 may also include non-volatile random access memory (NVRAM).

The memory 240 stores the following elements, executable modules or data structures, or a subset thereof, or an extended set thereof:

Operation instructions: include various operation instructions for implementing various operations.

Operating system: Includes a variety of system programs for implementing various basic services and handling hardware-based tasks.

In the embodiment of the present invention, the processor 230 performs the following operations by calling an operation instruction stored in the memory 240 (the operation instruction can be stored in the operating system):

The processor 330 is specifically configured to perform iterative calculation on the objective function using the first model parameter, where the objective function is a cost function for performing image training; if the result of the iterative calculation does not satisfy the termination condition, determining the a first gradient of the objective function on the model parameter, and updating a learning rate according to the parameter distribution characteristic exhibited by the model parameter in the objective function; updating the learning according to the learning rate and the first gradient The first model parameter; repeating the above steps until the result of the iterative calculation satisfies the termination condition, and acquiring a first model parameter that satisfies the termination condition.

The processor 230 controls the operation of the image training device 20, which may also be referred to as a CPU (Central Processing Unit). Memory 240 can include read only memory and random access memory and provides instructions and data to processor 230. A portion of memory 240 may also include non-volatile random access memory (NVRAM). In a particular application, the various components of image training device 20 are coupled together by a bus system 250, wherein bus system 250 includes, in addition to a data bus. In addition, it can also include a power bus, a control bus, and a status signal bus. However, for clarity of description, various buses are labeled as bus system 250 in the figure.

The method disclosed in the foregoing embodiments of the present invention may be applied to the processor 230 or implemented by the processor 230. Processor 230 may be an integrated circuit chip with signal processing capabilities. In the implementation process, each step of the foregoing method may be completed by an integrated logic circuit of hardware in the processor 230 or an instruction in a form of software. The processor 230 described above may be a general purpose processor, a digital signal processor (DSP), an application specific integrated circuit (ASIC), an off-the-shelf programmable gate array (FPGA) or other programmable logic device, a discrete gate or transistor logic device, or discrete hardware. Component. The methods, steps, and logical block diagrams disclosed in the embodiments of the present invention may be implemented or carried out. The general purpose processor may be a microprocessor or the processor or any conventional processor or the like. The steps of the method disclosed in the embodiments of the present invention may be directly implemented by the hardware decoding processor, or may be performed by a combination of hardware and software modules in the decoding processor. The software module can be located in a conventional storage medium such as random access memory, flash memory, read only memory, programmable read only memory or electrically erasable programmable memory, registers, and the like. The storage medium is located in the memory 240, and the processor 230 reads the information in the memory 240 and performs the steps of the above method in combination with its hardware.

The image retrieval system for implementing the model parameter training method in the embodiment of the present invention is described below. It should be noted that the method described in each embodiment of the model parameter training method can be implemented in the image retrieval system of the present invention. Referring to FIG. 1, an embodiment of an image retrieval system in an embodiment of the present invention includes:

Image training device 11, retrieval device 12 and image database 13;

The image training device 11 includes: a calculation unit, a termination determination unit, a gradient determination unit, a rate update unit, and a parameter update unit; the calculation unit is configured to iteratively calculate an objective function using a model parameter, the objective function being used for Performing a cost function of image training; the termination determination unit is configured to determine whether the result of the iterative calculation satisfies a termination condition, and if not, the execution gradient determination unit and the rate update unit; if yes, the acquisition satisfies the a model parameter corresponding to the result of the iterative calculation of the termination condition; the gradient determination unit is configured to determine a first gradient of the objective function on the model parameter; the rate update unit is configured to be based on the model parameter The parameter distribution feature displayed in the objective function updates a learning rate; the parameter updating unit is configured to update the model parameter according to the learning rate and the first gradient, and trigger the computing unit and the Terminate the decision unit. The searching device is configured to perform neural network feature extraction on the input image data according to the model parameter determined by the image training device, and perform image retrieval in the image database according to the neural network feature, and output the image retrieval result.

The searching device 12 is configured to perform neural network feature extraction on the input image data according to the model parameters determined by the image training device, and perform image retrieval in the image database 13 according to the neural network feature, and output the image. The result of the search.

Further, the rate update unit is specifically configured to:

Updating a learning rate corresponding to each element in the first model parameter, and when processing the jth element in the first model parameter, updating the learning rate according to the following formula;

Said

For the specific operation of the image retrieval system in the embodiment of the present invention, reference may be made to the foregoing embodiments, and details are not described herein again.

In the several embodiments provided herein, it should be understood that the disclosed apparatus and method can be implemented in other ways. For example, the device embodiments described above are merely illustrative. For example, the division of the unit is only a logical function division. In actual implementation, there may be another division manner, for example, multiple units or components may be combined or Can be integrated into another system, or some features can be ignored or not executed. Another point, the mutual coupling or direct coupling or communication shown or discussed The letter connection can be an indirect coupling or communication connection through some interface, device or unit, and can be in electrical, mechanical or other form.

The units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, may be located in one place, or may be distributed to multiple network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, each functional unit in each embodiment of the present invention may be integrated into one processing unit, or each unit may exist physically separately, or two or more units may be integrated into one unit. The above integrated unit can be implemented in the form of hardware or in the form of a software functional unit.

The integrated unit, if implemented in the form of a software functional unit and sold or used as a standalone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention, which is essential or contributes to the prior art, or all or part of the technical solution, may be embodied in the form of a software product stored in a storage medium. A number of instructions are included to cause a computer device (which may be a personal computer, server, or network device, etc.) to perform all or part of the steps of the methods described in various embodiments of the present invention. The foregoing storage medium includes: a U disk, a mobile hard disk, a read-only memory (ROM), a random access memory (RAM), a magnetic disk, or an optical disk, and the like. .

The above is only a specific embodiment of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art can easily think of changes or substitutions within the technical scope of the present invention. It should be covered by the scope of the present invention. Therefore, the scope of the invention should be determined by the scope of the claims.

Claims

A model parameter training method, comprising:

The objective function is iteratively calculated using model parameters, which are cost functions for image training,

If the result of the iterative calculation does not satisfy the termination condition,

Determining a first gradient of the objective function on the model parameter, and updating a learning rate according to a parameter distribution characteristic of the model parameter in the objective function,

Updating the model parameters according to the learning rate and the first gradient,

The above steps are repeated until the result of the iterative calculation satisfies the termination condition, and the model parameters corresponding to the result of the iterative calculation satisfying the termination condition are acquired.
The method according to claim 1, wherein the updating the learning rate according to the parameter distribution feature displayed in the objective function by the model parameter comprises:

The learning rate is updated based on a gradient of the objective function over a previous model parameter and the first gradient.
The method according to claim 2, wherein the updating the learning rate according to the gradient of the objective function on the previous model parameter and the first gradient comprises:

Updating a learning rate corresponding to each element in the model parameter, and when processing the jth element in the model parameter, updating the learning rate according to the following formula;

Said
Representing the learning rate corresponding to the jth element of the model parameter in the k+1th model parameter update,
Representing the model parameter change amount corresponding to the jth element of the model parameter corresponding to the k+1th model parameter update,
Representing a first gradient corresponding to the jth element of the model parameter in the k+1th model parameter update,
Representing a gradient corresponding to the kth element parameter update of the previous model parameter, the k being an integer greater than zero, and the j being greater than or equal to an integer of zero.
An image training device, comprising:

a calculation unit, a termination determination unit, a gradient determination unit, a rate update unit, and a parameter update unit;

The computing unit is configured to iteratively calculate an objective function using a model parameter, the objective function being a cost function for performing image training;

The termination determining unit is configured to determine whether the result of the iterative calculation satisfies a termination condition, and if not, the execution gradient determining unit and the rate update unit; if yes, acquiring the iterative calculation that satisfies the termination condition Corresponding model parameters;

The gradient determining unit is configured to determine a first gradient of the objective function on the model parameter;

The rate update unit is configured to update a learning rate according to a parameter distribution feature that is displayed in the objective function by the model parameter;

The parameter updating unit is configured to update the model parameter according to the learning rate and the first gradient, and trigger the calculating unit and the termination determining unit.
The method according to claim 4, wherein the rate update unit is specifically configured to:

The learning rate is updated based on a gradient of the objective function over a previous model parameter and the first gradient.
The method according to claim 5, wherein the rate update unit is specifically configured to:

Updating a learning rate corresponding to each element in the model parameter, and when processing the jth element in the model parameter, updating the learning rate according to the following formula;

Said
Representing the learning rate corresponding to the jth element of the model parameter in the k+1th model parameter update,
Representing the model parameter change amount corresponding to the jth element of the model parameter corresponding to the k+1th model parameter update,
Representing a first gradient corresponding to the jth element of the model parameter in the k+1th model parameter update,
Representing a gradient corresponding to the kth element parameter update of the previous model parameter, the k being an integer greater than zero, and the j being greater than or equal to an integer of zero.
An image retrieval system, comprising:

Image training device, retrieval device and image database;

The image training device includes: a calculation unit, a termination determination unit, a gradient determination unit, a rate update unit, and a parameter update unit; the calculation unit is configured to iteratively calculate an objective function using a model parameter, wherein the objective function is for performing a cost function of image training; the termination determination unit is configured to determine whether the result of the iterative calculation satisfies a termination condition, and if not, the execution gradient determination unit and the rate update unit; if yes, the acquisition satisfies the termination a result of the iterative calculation of the condition corresponding to the model parameter; the gradient determining unit is configured to determine a first gradient of the objective function on the model parameter; the rate update unit is configured to The parameter distribution feature shown in the objective function updates the learning rate; the parameter updating unit is configured to update the model parameter according to the learning rate and the first gradient, and trigger the computing unit and the termination a determining unit; the searching device is configured to determine a model parameter according to the image training device The neural network feature extraction performed on the input image data, and image retrieval, the output of the image search results in the image database according to characteristic of the neural network.
The method according to claim 7, wherein the rate update unit is specifically configured to:

The learning rate is updated based on a gradient of the objective function over a previous model parameter and the first gradient.
The method according to claim 8, wherein the rate update unit is specifically configured to:

Updating a learning rate corresponding to each element in the model parameter, and when processing the jth element in the model parameter, updating the learning rate according to the following formula;

Said
Representing the learning rate corresponding to the jth element of the model parameter in the k+1th model parameter update,
Representing the amount of change in the model parameter corresponding to the parameter update of the jth element of the model parameter in the k+1th model parameter,
Representing a first gradient corresponding to the jth element of the model parameter in the k+1th model parameter update,
Representing a gradient corresponding to the kth element parameter update of the previous model parameter, the k being an integer greater than zero, and the j being greater than or equal to an integer of zero.