WO2016062044A1 - Model parameter training method, device and system - Google Patents

Model parameter training method, device and system Download PDF

Info

Publication number
WO2016062044A1
WO2016062044A1 PCT/CN2015/076967 CN2015076967W WO2016062044A1 WO 2016062044 A1 WO2016062044 A1 WO 2016062044A1 CN 2015076967 W CN2015076967 W CN 2015076967W WO 2016062044 A1 WO2016062044 A1 WO 2016062044A1
Authority
WO
WIPO (PCT)
Prior art keywords
model parameter
gradient
parameter
update
objective function
Prior art date
Application number
PCT/CN2015/076967
Other languages
French (fr)
Chinese (zh)
Inventor
唐胜
万吉
柴振华
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Publication of WO2016062044A1 publication Critical patent/WO2016062044A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor

Definitions

  • the present invention relates to communication technologies, and in particular, to a model parameter training method, apparatus and system.
  • the traditional method of retrieving images based on keywords has a semantic gap problem, which often causes users to often fail to retrieve the images they want to search.
  • the Content Based Image Retrieval (CBIR) method is a search method that is more similar to human thinking.
  • the current CBIR system relies mainly on some shallow machine learning algorithms, and its performance is greatly limited.
  • Deep Learning is the most eye-catching direction in machine learning in recent years.
  • the motivation lies in the establishment and simulation of a neural network for the analysis of the human brain, which mimics the mechanisms of the human brain to interpret data such as images, sounds and text.
  • the concept of deep learning stems from the research of artificial neural networks, and its basic learning structure is a multi-layer neural network.
  • deep learning mimics the "deep” layer learning structure of the human brain through multiple transformations and expression steps. By exploring the deep structure, you can learn from the data to get a gradual abstraction of hierarchical features.
  • Deep learning has aroused widespread concern in academia and industry, resulting in a series of Deep Neural Network (DNN) models, such as Deep Belief Nets (DBNs), Deep Boltzmann. Deep Boltzmann Machines (DBMs), Convolutional Neural Networks (CNNs), etc.
  • DNN Deep Neural Network
  • DBMs Deep Boltzmann Machines
  • CNNs Convolutional Neural Networks
  • the image retrieval problem to be solved is first abstracted into an optimization problem, the objective function is defined, and then solved by the corresponding optimization algorithm.
  • the optimization problems to be solved as follows:
  • w is the model parameter
  • X is the training data set
  • l(w;x) is the cost function.
  • the goal of the solution is to find a set of optimal model parameters w * such that the model has the lowest total cost on the training data set.
  • l(w;x) is usually related to the classification error rate, so minimizing the objective function L(w) is equivalent to minimizing the classification error rate.
  • L(w) is usually a complex nonlinear function, and often the global optimal solution w * cannot be obtained, but only the local optimal solution can be obtained.
  • the solution to the problem needs to be iterated on the training data.
  • the commonly used methods are stochastic gradient descent method, Newton method and quasi-Newton method.
  • Stochastic Gradient Descent is an optimization method widely used in deep learning.
  • the advantage is that it is easy to implement, fast, and can be used for large-scale training sets.
  • the basic process of the stochastic gradient descent method is: iteratively calculating the cost function using the initial model parameters, judging whether the result of the iterative calculation satisfies the termination condition, and if not, updating the model parameters according to the preset learning rate and the current gradient value, Continue the iterative calculation until the result of the iterative calculation satisfies the termination condition.
  • a disadvantage of the prior art stochastic gradient descent method is that manual parameter selection is required, including learning rate, termination conditions, and the like.
  • learning rate is set too small, the training process will be very slow; when the learning rate is set too large, it may skip the local optimal solution when updating the model parameters for iterative calculation, so that the convergence speed does not fall, even Causes no convergence.
  • Embodiments of the present invention provide a model parameter training method, apparatus, and system for rapidly performing parameter retrieval of image retrieval or image classification.
  • the objective function is iteratively calculated using model parameters, which are cost functions for image training,
  • the updating the learning rate according to the parameter distribution feature that is displayed in the target function according to the model parameter includes:
  • the learning rate is updated based on a gradient of the objective function over a previous model parameter and the first gradient.
  • the step of updating the learning rate according to a gradient of the objective function on a previous model parameter, and the first gradient include:
  • a calculation unit a termination determination unit, a gradient determination unit, a rate update unit, and a parameter update unit;
  • the computing unit is configured to iteratively calculate an objective function using a model parameter, the objective function being a cost function for performing image training;
  • the termination determining unit is configured to determine whether the result of the iterative calculation satisfies a termination condition, and if not, the execution gradient determining unit and the rate update unit; if yes, acquiring the iterative calculation that satisfies the termination condition Corresponding model parameters;
  • the gradient determining unit is configured to determine a first gradient of the objective function on the model parameter
  • the rate update unit is configured to update a learning rate according to a parameter distribution feature that is displayed in the objective function by the model parameter;
  • the parameter updating unit is configured to update the model parameter according to the learning rate and the first gradient, and trigger the calculating unit and the termination determining unit.
  • the rate update unit is specifically configured to:
  • the learning rate is updated based on a gradient of the objective function over a previous model parameter and the first gradient.
  • the rate update unit is specifically configured to:
  • Image training device retrieval device and image database
  • the image training device includes: a calculation unit, a termination determination unit, a gradient determination unit, a rate update unit, and a parameter update unit; the calculation unit is configured to iteratively calculate an objective function using a model parameter, wherein the objective function is for performing Cost function of image training;
  • the termination determining unit is configured to determine whether the result of the iterative calculation satisfies a termination condition, and if not, the execution gradient determining unit and the rate update unit; if yes, acquiring the iterative calculation that satisfies the termination condition Corresponding model parameters; the gradient determination unit is used to determine a first gradient of the objective function on the model parameter; the rate update unit is configured to update a learning rate according to a parameter distribution feature exhibited by the model parameter in the objective function; the parameter update unit, And configured to update the model parameter according to the learning rate and the first gradient, and trigger the computing unit and the termination determining unit.
  • the searching device is configured to perform neural network feature extraction on the input image data according to the model parameter determined by the image training device, and perform image retrieval in the image database according to the neural network feature, and output the image retrieval result.
  • the rate update unit is specifically configured to:
  • the learning rate is updated based on a gradient of the objective function over a previous model parameter and the first gradient.
  • the rate update unit is specifically configured to:
  • the iterative calculation is continued, and the parameter distribution represented by the target function according to the model parameter is performed before the next iteration calculation.
  • the feature updates the learning rate, and then uses the learning rate to update the model parameters used in the next iteration calculation, so that the variation range of the model parameters can be adaptively adjusted according to the parameter distribution characteristics of the objective function, thereby local optimization away from the model parameters.
  • you can pass The learning rate sets the magnitude of the variation of the model parameters to speed up the iterative calculation process.
  • the variation range of the model parameters can be set by updating the learning rate, and the iterative calculation is improved. Efficiency, and in turn, improves the speed of image training.
  • FIG. 1 is a schematic structural diagram of an image retrieval apparatus in an embodiment of the present invention.
  • FIG. 2 is a schematic flow chart of a model parameter training method in an embodiment of the present invention.
  • FIG. 3 is another schematic flowchart of a model parameter training method in an embodiment of the present invention.
  • FIG. 4 is a schematic diagram of an objective function curve in an embodiment of the present invention.
  • Figure 5 is another schematic diagram of an objective function curve in an embodiment of the present invention.
  • FIG. 6 is another schematic diagram of an objective function curve in an embodiment of the present invention.
  • FIG. 7 is another schematic diagram of an objective function curve in an embodiment of the present invention.
  • FIG. 8 is a schematic diagram of a convergence test in an embodiment of the present invention.
  • FIG. 9 is a schematic structural diagram of an image training apparatus according to an embodiment of the present invention.
  • FIG. 10 is a schematic diagram showing the computer structure of an image training apparatus based on a model parameter training method according to an embodiment of the present invention.
  • a model parameter training method in the embodiment of the present invention is applied to the image retrieval system shown in FIG. 1 , specifically:
  • a computer device In practical applications, in order to enable a computer to output human-desired results when searching, a computer device is required for deep learning to establish and simulate a neural network for human brain analysis and learning, which mimics the mechanism of the human brain to interpret data and images.
  • the training device simulates the deep learning structure of the human brain through multiple transformation and expression steps. By exploring the deep structure, it is possible to learn from the data to obtain gradually abstract hierarchical features. Therefore, in order to realize deep learning, it is necessary to provide the image training device 11 in the image retrieval system, perform training of mass data, and determine model parameters for image retrieval.
  • the image data is input in the retrieval device 12 of the image retrieval system, and the retrieval device 12 performs neural network feature extraction on the image data according to the model parameters determined by the image training device 11, and according to the neural network.
  • the feature performs a comparative search of the image in the image database 13, and outputs the result of the image retrieval.
  • the result of the image retrieval may be output in descending order according to the similarity of the image.
  • the image retrieval problem to be solved is first abstracted into an optimization problem, the objective function is defined, and then solved by a corresponding optimization algorithm, and the solution target is to find a group of the most Excellent model parameters make the model the least costly on the training data set.
  • the stochastic gradient descent method can be used to solve the optimal model parameters.
  • the speed of image training is not ideal.
  • the model parameter training method is optimized and improved according to the stochastic gradient descent method.
  • the model parameters used in the current iterative calculation are the model parameters updated after the previous iteration calculation.
  • the model parameter used in the current iterative calculation is used as the first model parameter
  • the model parameter used in the previous iterative calculation is used as the second model parameter
  • a gradient on a model parameter is used as a first gradient
  • a gradient of the objective function on the second model parameter is used as a second gradient.
  • the initial model parameters are used as the first model parameters.
  • the initial model parameters are updated for the first time using the initial learning rate, and the updated model parameters are taken as First iteratively calculated first model parameter; embodiment of the invention
  • the model parameter training method is applied to the iterative calculation after the "first update of the initial model parameters".
  • an embodiment of a method for training a model parameter in an embodiment of the present invention includes:
  • the image training device iteratively calculates an objective function using a first model parameter, which is a cost function for performing image training.
  • a first model parameter which is a cost function for performing image training.
  • the mapping of the input image through the neural network is ⁇ w (q), ⁇ w (q + ), ⁇ w (q - ) ⁇ , where ⁇ w (q), ⁇ w (q + ), ⁇ w (q - ) All are one-dimensional column vectors, used as image feature representations, then the cost function can be:
  • cost function may also have other representations, which need to be determined according to actual needs, which is not limited herein.
  • the image training device After the image training device iteratively calculates the objective function using the first model parameter, the image training device determines whether the result of the current iterative calculation satisfies the termination condition, and if not, determines the first of the objective function on the model parameter Gradient, and updating the learning rate according to the parameter distribution characteristics exhibited by the model parameter in the objective function; the learning rate is used to determine an update range of the first model parameter.
  • the “parameter distribution feature exhibited by the model parameter in the objective function” may be expressed as a gradient change of a corresponding parameter point on a function image of the objective function.
  • the termination condition may have multiple manifestations, for example, when the first model parameter meets a certain range of values in the settlement result of the objective function, the iterative calculation is terminated; For example, when the number of iteration calculations reaches a certain threshold, the iterative calculation is terminated. It can be understood that the termination condition may have more representations in practical applications, and is not specifically limited herein.
  • the image training device updates the first model parameter according to the learning rate and the first gradient, and the specific learning rate may be used to determine an update amplitude of the first model parameter, where the first gradient may be used Determining an update direction of the first model parameter.
  • step 201 and step 202 are triggered in sequence until the result of the iterative calculation satisfies the termination condition, then the iterative calculation is stopped, and the first model parameter that satisfies the termination condition is acquired.
  • the iterative calculation is continued, and the parameter represented by the model parameter in the objective function is performed before the next iterative calculation.
  • the distribution feature updates the learning rate, and then uses the learning rate to update the model parameters used in the next iteration calculation, so that the variation range of the model parameters can be adaptively adjusted according to the parameter distribution characteristics of the objective function, thereby being far away from the model parameters.
  • the variation range of the model parameters can be set by the learning rate to speed up the iterative calculation process.
  • the variation range of the model parameters can be set by updating the learning rate. , improve the efficiency of iterative calculations, and thus improve the speed of image training.
  • the iterative calculation of training data can also use Newton method and quasi-Newton method, but the second-order partial derivative and Hessian matrix need to be calculated in the calculation process, the computational complexity is high, and sometimes the Hessian matrix of the objective function cannot maintain positive definite Thus, the Newton method or the quasi-Newton method is invalidated.
  • the model parameter determination method proposed by the embodiment of the present invention does not require the information of the second derivative and the calculation or approximate calculation of the Hessian matrix, so it is more efficient than the Newton method and the quasi-Newton method, and can be used to solve other unconstrained, constrained or large-scale Nonlinear optimization problem.
  • the subscript k indicates that the parameter corresponding to the iterative calculation is currently being performed
  • the superscript j indicates that the parameter is the same as the first model parameter.
  • another embodiment of the method for determining a model parameter in the embodiment of the present invention includes:
  • the image training device iteratively calculates an objective function using a first model parameter, which is a cost function for performing image training.
  • the image training device After the image training device iteratively calculates the objective function using the first model parameter, the image training device determines whether the result of the current iterative calculation satisfies the termination condition, and if so, stops the iterative calculation, and acquires the first model that satisfies the termination condition. Parameter; if no, step 303 is performed.
  • the termination condition may have multiple manifestations, for example, when the first model parameter satisfies a certain range of values in the settlement result of the objective function, the iterative calculation is terminated; The iterative calculation is terminated when the number of iterative calculations reaches a certain threshold. It can be understood that the termination condition may have more representations in practical applications, and is not specifically limited herein.
  • An image training device determines a first gradient of the objective function on the model parameter and updates a learning rate according to a parameter distribution characteristic of the model parameter in the objective function; the learning rate is used to determine The update range of the first model parameter is described.
  • the updating the learning rate according to the parameter distribution feature that is displayed in the objective function according to the model parameter includes:
  • the learning rate is updated based on a gradient of the objective function over a previous model parameter and the first gradient.
  • the gradient value g k at the first model parameter w k in calculating the objective function L(w) may specifically be:
  • the learning rate is updated according to the second gradient, the model parameter variation, and the first gradient, specifically:
  • the learning rate corresponding to the jth element of the model parameter is updated in the k+1th model parameter, and the
  • the model parameter change amount is a difference between an element in the first model parameter and an element in a corresponding order or position in the second model parameter, and then takes an absolute value.
  • the method for updating the first model parameters in the stochastic gradient descent method is:
  • the learning rate ⁇ k is proportional to the absolute value
  • ⁇ k is a proportional parameter between the learning rate and the amount of change of the model parameter.
  • Equation 8 the relationship of the rate ⁇ k can be learned as:
  • the image training device updates the first model parameter according to the learning rate and the first gradient, and the specific learning rate may be used to determine an update amplitude of the first model parameter, where the first gradient may be used Determining an update direction of the first model parameter.
  • step 301 and step 302 are triggered in sequence until the result of the iterative calculation satisfies the termination condition, then the iterative calculation is stopped, and the first model parameter that satisfies the termination condition is acquired.
  • point A is the parameter point corresponding to the k-1th iteration
  • point B is the parameter point corresponding to the kth iteration calculation
  • C The point is a parameter point corresponding to a local optimum value in the objective function. According to the formula nine can be obtained: This allows the next iteration (k+1th iteration) to fall between point A and point B near point A, adaptively approaching the local optimal parameter point C.
  • point A is the parameter point corresponding to the k-1th iteration calculation
  • point B is the parameter point corresponding to the kth iteration calculation
  • point C is the parameter corresponding to a local optimum value in the objective function. point.
  • an image retrieval experiment was performed on the Paris data set.
  • the data set has 6,412 images and contains 11 landmarks (landmarks) in Paris. Each of the landmarks selected 5 images for use as a query.
  • the CNNs feature is first extracted on the ImageNet dataset, and then learned and adjusted (model tuning) using the SGD and the method of the present invention on the Paris dataset. Since the model contains about 60 million parameters, neither Newton nor quasi-Newton methods can be used for model training. Therefore, only the method of the present invention and the currently widely used SGD method were compared in the experiment. Compare the convergence speed of SGD and the proposed method in model tuning, and learn after tuning. The average accuracy (mAP) of the model in the image retrieval task.
  • mAP average accuracy
  • FIG. 8 is a comparison of the convergence speed of the training of the model parameter training method in the SGD algorithm and the embodiment of the present invention. Since the training uses a randomly selected 3-tuple, the loss function fluctuates greatly, taking the average of the last hundred iterations to smooth the convergence curve. It can be seen that the convergence speed of the model parameter training method in the embodiment of the present invention is significantly faster than the SGD algorithm, and the iterative error of the model parameter training method in the embodiment of the present invention is much lower than that of the SGD, and the iteration is 10000 times. The error has reached the final (100,000 times) convergence error of SGD (0.0125), that is, under the same error termination condition, the model parameter training method in the embodiment of the present invention is increased by 10 times.
  • an embodiment of an image training apparatus in an embodiment of the present invention includes:
  • Computing unit 901 termination determining unit 902, gradient determining unit 903, rate updating unit 904 and parameter updating unit 905;
  • the calculating unit 901 is configured to perform an iterative calculation on the objective function using a model parameter, where the objective function is a cost function for performing image training;
  • the termination determining unit 902 is configured to determine whether the result of the iterative calculation satisfies a termination condition, and if not, the execution gradient determining unit 903 and the rate updating unit 904; if yes, acquiring a location that satisfies the termination condition The model parameters corresponding to the results of the iterative calculation;
  • the gradient determining unit 903 is configured to determine a first gradient of the objective function on the model parameter
  • the rate update unit 904 is configured to update a learning rate according to a parameter distribution feature that is displayed in the target function by the model parameter;
  • the parameter updating unit 905 is configured to update the according to the learning rate and the first gradient
  • the model parameters are triggered, and the calculation unit 901 and the termination determination unit 902 are triggered.
  • rate update unit 904 is specifically configured to:
  • the learning rate is updated based on a gradient of the objective function over a previous model parameter and the first gradient.
  • the calculation unit 901 performs an iterative calculation on the objective function using the first model parameter, which is a cost function for performing image training.
  • the termination determination unit 902 determines whether the result of the current iteration calculation satisfies the termination condition, and if not, executes the gradient determination unit 903 and the rate update unit 904.
  • the termination condition may have multiple manifestations, for example, when the first model parameter satisfies a certain range of values in the settlement result of the objective function, the iterative calculation is terminated; The iterative calculation is terminated when the number of iterative calculations reaches a certain threshold. It can be understood that the termination condition may have more representations in practical applications, and is not specifically limited herein.
  • the gradient determining unit 903 determines a first gradient based on the objective function, the first gradient being a gradient of the objective function at the first model parameter.
  • the gradient value g k at the first model parameter w k in calculating the objective function L(w) may specifically be:
  • the rate update unit 904 updates the learning rate according to the parameter distribution features exhibited by the model parameters in the objective function, the learning rate being used to determine the update magnitude of the first model parameter.
  • the updating the learning rate according to the parameter distribution feature that is displayed in the objective function according to the model parameter includes:
  • the learning rate is updated based on a gradient of the objective function over a previous model parameter and the first gradient.
  • the learning rate is updated according to the second gradient, the model parameter variation, and the first gradient, specifically:
  • the parameter updating unit 905 updates the first model parameter according to the learning rate and the first gradient, and the specific learning rate may be used to determine an update range of the first model parameter, where the first gradient may be used. And determining an update direction of the first model parameter.
  • the calculation unit 901 is triggered again, and the iterative calculation is continued on the target function using the updated first model parameter until the result of the iterative calculation satisfies the termination condition, and then stops.
  • the iterative calculation acquires a first model parameter that satisfies the termination condition.
  • FIG. 10 is a schematic structural diagram of an image training device 20 according to an embodiment of the present invention.
  • the image training device 20 can include an input device 210, an output device 220, a processor 230, and a memory 240.
  • the image training device 20 provided by the embodiment of the present invention is applied to a stream computing system, where the stream computing system is configured to process and process a service, where the stream computing system includes a master node and a plurality of working nodes; Scheduling each sub-service included in the service to the plurality of work nodes for processing.
  • Memory 240 can include read only memory and random access memory and provides instructions and data to processor 230. A portion of memory 240 may also include non-volatile random access memory (NVRAM).
  • NVRAM non-volatile random access memory
  • the memory 240 stores the following elements, executable modules or data structures, or a subset thereof, or an extended set thereof:
  • Operation instructions include various operation instructions for implementing various operations.
  • Operating system Includes a variety of system programs for implementing various basic services and handling hardware-based tasks.
  • the processor 230 performs the following operations by calling an operation instruction stored in the memory 240 (the operation instruction can be stored in the operating system):
  • the processor 330 is specifically configured to perform iterative calculation on the objective function using the first model parameter, where the objective function is a cost function for performing image training; if the result of the iterative calculation does not satisfy the termination condition, determining the a first gradient of the objective function on the model parameter, and updating a learning rate according to the parameter distribution characteristic exhibited by the model parameter in the objective function; updating the learning according to the learning rate and the first gradient The first model parameter; repeating the above steps until the result of the iterative calculation satisfies the termination condition, and acquiring a first model parameter that satisfies the termination condition.
  • the updating the learning rate according to the parameter distribution feature that is displayed in the objective function according to the model parameter includes:
  • the learning rate is updated based on a gradient of the objective function over a previous model parameter and the first gradient.
  • the processor 230 controls the operation of the image training device 20, which may also be referred to as a CPU (Central Processing Unit).
  • Memory 240 can include read only memory and random access memory and provides instructions and data to processor 230. A portion of memory 240 may also include non-volatile random access memory (NVRAM).
  • NVRAM non-volatile random access memory
  • the various components of image training device 20 are coupled together by a bus system 250, wherein bus system 250 includes, in addition to a data bus. In addition, it can also include a power bus, a control bus, and a status signal bus. However, for clarity of description, various buses are labeled as bus system 250 in the figure.
  • Processor 230 may be an integrated circuit chip with signal processing capabilities. In the implementation process, each step of the foregoing method may be completed by an integrated logic circuit of hardware in the processor 230 or an instruction in a form of software.
  • the processor 230 described above may be a general purpose processor, a digital signal processor (DSP), an application specific integrated circuit (ASIC), an off-the-shelf programmable gate array (FPGA) or other programmable logic device, a discrete gate or transistor logic device, or discrete hardware. Component.
  • DSP digital signal processor
  • ASIC application specific integrated circuit
  • FPGA off-the-shelf programmable gate array
  • the methods, steps, and logical block diagrams disclosed in the embodiments of the present invention may be implemented or carried out.
  • the general purpose processor may be a microprocessor or the processor or any conventional processor or the like.
  • the steps of the method disclosed in the embodiments of the present invention may be directly implemented by the hardware decoding processor, or may be performed by a combination of hardware and software modules in the decoding processor.
  • the software module can be located in a conventional storage medium such as random access memory, flash memory, read only memory, programmable read only memory or electrically erasable programmable memory, registers, and the like.
  • the storage medium is located in the memory 240, and the processor 230 reads the information in the memory 240 and performs the steps of the above method in combination with its hardware.
  • an embodiment of an image retrieval system in an embodiment of the present invention includes:
  • Image training device 11 retrieval device 12 and image database 13;
  • the image training device 11 includes: a calculation unit, a termination determination unit, a gradient determination unit, a rate update unit, and a parameter update unit; the calculation unit is configured to iteratively calculate an objective function using a model parameter, the objective function being used for Performing a cost function of image training; the termination determination unit is configured to determine whether the result of the iterative calculation satisfies a termination condition, and if not, the execution gradient determination unit and the rate update unit; if yes, the acquisition satisfies the a model parameter corresponding to the result of the iterative calculation of the termination condition; the gradient determination unit is configured to determine a first gradient of the objective function on the model parameter; the rate update unit is configured to be based on the model parameter The parameter distribution feature displayed in the objective function updates a learning rate; the parameter updating unit is configured to update the model parameter according to the learning rate and the first gradient, and trigger the computing unit and the Terminate the decision unit.
  • the searching device is configured to perform neural network feature extraction on the input image data according to the model
  • the searching device 12 is configured to perform neural network feature extraction on the input image data according to the model parameters determined by the image training device, and perform image retrieval in the image database 13 according to the neural network feature, and output the image. The result of the search.
  • rate update unit is specifically configured to:
  • the learning rate is updated based on a gradient of the objective function over a previous model parameter and the first gradient.
  • the disclosed apparatus and method can be implemented in other ways.
  • the device embodiments described above are merely illustrative.
  • the division of the unit is only a logical function division.
  • there may be another division manner for example, multiple units or components may be combined or Can be integrated into another system, or some features can be ignored or not executed.
  • Another point, the mutual coupling or direct coupling or communication shown or discussed The letter connection can be an indirect coupling or communication connection through some interface, device or unit, and can be in electrical, mechanical or other form.
  • the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, may be located in one place, or may be distributed to multiple network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of the embodiment.
  • each functional unit in each embodiment of the present invention may be integrated into one processing unit, or each unit may exist physically separately, or two or more units may be integrated into one unit.
  • the above integrated unit can be implemented in the form of hardware or in the form of a software functional unit.
  • the integrated unit if implemented in the form of a software functional unit and sold or used as a standalone product, may be stored in a computer readable storage medium.
  • the technical solution of the present invention which is essential or contributes to the prior art, or all or part of the technical solution, may be embodied in the form of a software product stored in a storage medium.
  • a number of instructions are included to cause a computer device (which may be a personal computer, server, or network device, etc.) to perform all or part of the steps of the methods described in various embodiments of the present invention.
  • the foregoing storage medium includes: a U disk, a mobile hard disk, a read-only memory (ROM), a random access memory (RAM), a magnetic disk, or an optical disk, and the like. .

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Databases & Information Systems (AREA)
  • Image Analysis (AREA)

Abstract

A model parameter training method, device and system, which are used for rapidly carrying out parameter training for image retrieval or image classification. The method comprises: using model parameters to carry out iterative computation on an objective function, the objective function being a cost function used for image training; if the result of the iterative computation does not meet the termination condition, determining the first gradient of the objective function on the model parameters, and updating the learning rate according to the parameter distribution characteristics of the model parameters in the objective function; updating the model parameters according to the learning rate and the first gradient; repeating the previous steps till the result of the iterative computation meets the termination condition; and obtaining the model parameter corresponding to the result of the iterative computation that meets the termination condition.

Description

一种模型参数训练方法、装置及系统Model parameter training method, device and system
本申请要求于2014年10月24日提交中国专利局、申请号为201410579249.2、发明名称为“一种模型参数训练方法、装置及系统”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。The present application claims priority to Chinese Patent Application No. 201410579249.2, entitled "A Model Parameter Training Method, Apparatus and System", filed on October 24, 2014, the entire contents of which are incorporated herein by reference. In this application.
技术领域Technical field
本发明涉及通信技术,尤其涉及一种模型参数训练方法、装置及系统。The present invention relates to communication technologies, and in particular, to a model parameter training method, apparatus and system.
背景技术Background technique
传统的基于关键字去检索图像的方法存在语义鸿沟问题,往往导致用户经常检索不到自己想搜的图片。而基于内容的图片检索(CBIR,Content Based Image Retrieval)方法则是一种更类似人类思维的一种检索方式。当前的CBIR系统主要依赖于一些浅层机器学习算法,其性能受到很大的制约。而深度学习(Deep Learning)是近年来机器学习领域最令人瞩目的方向。其动机在于建立、模拟人脑进行分析学习的神经网络,它模仿人脑的机制来解释数据,例如图像,声音和文本。深度学习的概念源于人工神经网络的研究,其基本学习结构是多层神经网络。与传统机器学习算法的“浅”层学习结构不同,深度学习通过多个变换、表达步骤来模仿人脑的“深”层学习结构。通过探索深层结构,可以从数据中学习得到逐渐抽象的层次化特征。The traditional method of retrieving images based on keywords has a semantic gap problem, which often causes users to often fail to retrieve the images they want to search. The Content Based Image Retrieval (CBIR) method is a search method that is more similar to human thinking. The current CBIR system relies mainly on some shallow machine learning algorithms, and its performance is greatly limited. Deep Learning is the most eye-catching direction in machine learning in recent years. The motivation lies in the establishment and simulation of a neural network for the analysis of the human brain, which mimics the mechanisms of the human brain to interpret data such as images, sounds and text. The concept of deep learning stems from the research of artificial neural networks, and its basic learning structure is a multi-layer neural network. Unlike the "shallow" layer learning structure of traditional machine learning algorithms, deep learning mimics the "deep" layer learning structure of the human brain through multiple transformations and expression steps. By exploring the deep structure, you can learn from the data to get a gradual abstraction of hierarchical features.
深度学习引起了学术界和工业界的广泛关注,产生出一系列的深度神经网络(Deep Neural Network,简称DNN)模型,例如,深度置信网(Deep Belief Nets,简称DBNs)、深度玻尔兹曼机(Deep Boltzmann Machines,简称DBMs)、卷积神经网络(Convolutional Neural Networks,简称CNNs)等。Deep learning has aroused widespread concern in academia and industry, resulting in a series of Deep Neural Network (DNN) models, such as Deep Belief Nets (DBNs), Deep Boltzmann. Deep Boltzmann Machines (DBMs), Convolutional Neural Networks (CNNs), etc.
研究高效的深度神经网络的学习算法,实现海量数据的快速训练,是从事深度学习技术研发首先要解决的问题。因此,深度神经网络的学习算法的研究尤其重要。Studying the efficient deep neural network learning algorithm and realizing the rapid training of massive data is the first problem to be solved in the research and development of deep learning technology. Therefore, the study of learning algorithms for deep neural networks is particularly important.
在机器进行图像训练的过程中,首先将要解决的图像检索问题抽象成一个最优化问题,定义目标函数,然后通过相应的最优化算法来对其进行求解。定义待解决的优化问题如下:In the process of image training of the machine, the image retrieval problem to be solved is first abstracted into an optimization problem, the objective function is defined, and then solved by the corresponding optimization algorithm. Define the optimization problems to be solved as follows:
Figure PCTCN2015076967-appb-000001
Figure PCTCN2015076967-appb-000001
其中,w是模型参数,X是训练数据集合,l(w;x)是代价函数。求解的目标就是找到一组最优的模型参数w*,使得模型在训练数据集上总的代价最小。以分类问题为例,l(w;x)通常与分类错误率相关,因而最小化目标函数L(w)就相当于最小化分类错误率。Where w is the model parameter, X is the training data set, and l(w;x) is the cost function. The goal of the solution is to find a set of optimal model parameters w * such that the model has the lowest total cost on the training data set. Taking the classification problem as an example, l(w;x) is usually related to the classification error rate, so minimizing the objective function L(w) is equivalent to minimizing the classification error rate.
特别的,在深度学习中,L(w)通常是复杂的非线性函数,往往无法求得全局最优解w*,而只能求得局部最优解
Figure PCTCN2015076967-appb-000002
对问题的求解需要在训练数据上迭代进行,常用的方法有随机梯度下降法、牛顿法和拟牛顿法。
In particular, in deep learning, L(w) is usually a complex nonlinear function, and often the global optimal solution w * cannot be obtained, but only the local optimal solution can be obtained.
Figure PCTCN2015076967-appb-000002
The solution to the problem needs to be iterated on the training data. The commonly used methods are stochastic gradient descent method, Newton method and quasi-Newton method.
在现有技术中,随机梯度下降法(Stochastic Gradient Descent,简称SGD)是深度学习中广泛采用的一种优化方法。其优点是容易实现,速度快,可用于大规模训练集。In the prior art, Stochastic Gradient Descent (SGD) is an optimization method widely used in deep learning. The advantage is that it is easy to implement, fast, and can be used for large-scale training sets.
随机梯度下降法的基本过程为:使用初始的模型参数对代价函数进行迭代计算,判断迭代计算的结果是否满足终止条件,若否,则根据预设的学习速率和当前的梯度值更新模型参数,继续进行迭代计算,直至迭代计算的结果满足终止条件为止。The basic process of the stochastic gradient descent method is: iteratively calculating the cost function using the initial model parameters, judging whether the result of the iterative calculation satisfies the termination condition, and if not, updating the model parameters according to the preset learning rate and the current gradient value, Continue the iterative calculation until the result of the iterative calculation satisfies the termination condition.
现有技术中的随机梯度下降法的缺点是需要人工进行参数选择,包括学习速率,终止条件等。当学习速率设置得过小时,训练过程会十分缓慢;当学习速率设置得过大时,则可能在更新模型参数进行迭代计算时跳过局部最优解,使得收敛的速度不降反升,甚至导致不收敛。A disadvantage of the prior art stochastic gradient descent method is that manual parameter selection is required, including learning rate, termination conditions, and the like. When the learning rate is set too small, the training process will be very slow; when the learning rate is set too large, it may skip the local optimal solution when updating the model parameters for iterative calculation, so that the convergence speed does not fall, even Causes no convergence.
发明内容Summary of the invention
本发明实施例提供了一种模型参数训练方法、装置及系统,用于快速的进行图像检索或图像分类的参数训练。Embodiments of the present invention provide a model parameter training method, apparatus, and system for rapidly performing parameter retrieval of image retrieval or image classification.
本发明实施例第一方面提供的模型参数训练方法,包括:The model parameter training method provided by the first aspect of the embodiments of the present invention includes:
使用模型参数对目标函数进行迭代计算,所述目标函数为用于进行图像训练的代价函数,The objective function is iteratively calculated using model parameters, which are cost functions for image training,
若所述迭代计算的结果不满足终止条件,If the result of the iterative calculation does not satisfy the termination condition,
则确定所述目标函数在所述模型参数上的第一梯度,并根据所述模型参数在所述目标函数中所表现出的参数分布特征更新学习速率,Determining a first gradient of the objective function on the model parameter, and updating a learning rate according to a parameter distribution characteristic of the model parameter in the objective function,
根据所述学习速率和所述第一梯度更新所述模型参数,Updating the model parameters according to the learning rate and the first gradient,
重复上述步骤,直至所述迭代计算的结果满足所述终止条件,获取满足所 述终止条件的所述迭代计算的结果对应的模型参数。Repeat the above steps until the result of the iterative calculation satisfies the termination condition, and the acquisition is satisfied. The model parameters corresponding to the results of the iterative calculation of the termination condition.
结合第一方面,在第一种可能的实现方式中,所述根据所述模型参数在所述目标函数中所表现出的参数分布特征更新学习速率,包括:With reference to the first aspect, in a first possible implementation, the updating the learning rate according to the parameter distribution feature that is displayed in the target function according to the model parameter includes:
根据所述目标函数在前一次模型参数上的梯度,及所述第一梯度更新所述学习速率。The learning rate is updated based on a gradient of the objective function over a previous model parameter and the first gradient.
结合第一方面第一种可能的实现方式,在第二种可能的实现方式中,所述根据所述目标函数在前一次模型参数上的梯度,及所述第一梯度更新所述学习速率,包括:With reference to the first possible implementation manner of the first aspect, in a second possible implementation manner, the step of updating the learning rate according to a gradient of the objective function on a previous model parameter, and the first gradient, include:
对所述模型参数中每一个元素对应的学习速率进行更新,当对所述模型参数中的第j个元素进行处理时,根据如下公式对所述学习速率进行更新;Updating a learning rate corresponding to each element in the model parameter, and when processing the jth element in the model parameter, updating the learning rate according to the following formula;
Figure PCTCN2015076967-appb-000003
Figure PCTCN2015076967-appb-000003
所述
Figure PCTCN2015076967-appb-000004
表示所述模型参数的第j个元素在第k+1次模型参数更新所对应的所述学习速率,所述
Figure PCTCN2015076967-appb-000005
表示所述模型参数的第j个元素在第k+1次模型参数更新所对应的所述模型参数变化量,所述
Figure PCTCN2015076967-appb-000006
表示所述模型参数的第j个元素在第k+1次模型参数更新所对应的第一梯度,所述
Figure PCTCN2015076967-appb-000007
表示所述前一次模型参数的第j个元素在第k次模型参数更新所对应的梯度,所述k为大于零的整数,所述j大于或等于零的整数。
Said
Figure PCTCN2015076967-appb-000004
Representing the learning rate corresponding to the jth element of the model parameter in the k+1th model parameter update,
Figure PCTCN2015076967-appb-000005
Representing the model parameter change amount corresponding to the jth element of the model parameter corresponding to the k+1th model parameter update,
Figure PCTCN2015076967-appb-000006
Representing a first gradient corresponding to the jth element of the model parameter in the k+1th model parameter update,
Figure PCTCN2015076967-appb-000007
Representing a gradient corresponding to the kth element parameter update of the previous model parameter, the k being an integer greater than zero, and the j being greater than or equal to an integer of zero.
本发明实施例第二方面提供的图像训练装置,包括:The image training device provided by the second aspect of the embodiments of the present invention includes:
计算单元,终止判定单元,梯度确定单元,速率更新单元以及参数更新单元;a calculation unit, a termination determination unit, a gradient determination unit, a rate update unit, and a parameter update unit;
所述计算单元用于使用模型参数对目标函数进行迭代计算,所述目标函数为用于进行图像训练的代价函数;The computing unit is configured to iteratively calculate an objective function using a model parameter, the objective function being a cost function for performing image training;
所述终止判定单元用于判定所述迭代计算的结果是否满足终止条件,若否,则所述执行梯度确定单元以及所述速率更新单元;若是,则获取满足所述终止条件的所述迭代计算的结果对应的模型参数;The termination determining unit is configured to determine whether the result of the iterative calculation satisfies a termination condition, and if not, the execution gradient determining unit and the rate update unit; if yes, acquiring the iterative calculation that satisfies the termination condition Corresponding model parameters;
所述梯度确定单元用于确定所述目标函数在所述模型参数上的第一梯度; The gradient determining unit is configured to determine a first gradient of the objective function on the model parameter;
所述速率更新单元用于根据所述模型参数在所述目标函数中所表现出的参数分布特征更新学习速率;The rate update unit is configured to update a learning rate according to a parameter distribution feature that is displayed in the objective function by the model parameter;
所述参数更新单元,用于根据所述学习速率和所述第一梯度更新所述模型参数,并触发所述计算单元及所述终止判定单元。The parameter updating unit is configured to update the model parameter according to the learning rate and the first gradient, and trigger the calculating unit and the termination determining unit.
结合第二方面,在第一种可能的实现方式中,所述速率更新单元具体用于:With reference to the second aspect, in a first possible implementation manner, the rate update unit is specifically configured to:
根据所述目标函数在前一次模型参数上的梯度,及所述第一梯度更新所述学习速率。The learning rate is updated based on a gradient of the objective function over a previous model parameter and the first gradient.
结合第二方面第一种可能的实现方式,在第二种可能的实现方式中,所述速率更新单元具体用于:With reference to the first possible implementation of the second aspect, in a second possible implementation, the rate update unit is specifically configured to:
对所述模型参数中每一个元素对应的学习速率进行更新,当对所述模型参数中的第j个元素进行处理时,根据如下公式对所述学习速率进行更新;Updating a learning rate corresponding to each element in the model parameter, and when processing the jth element in the model parameter, updating the learning rate according to the following formula;
Figure PCTCN2015076967-appb-000008
Figure PCTCN2015076967-appb-000008
所述
Figure PCTCN2015076967-appb-000009
表示所述模型参数的第j个元素在第k+1次模型参数更新所对应的所述学习速率,所述
Figure PCTCN2015076967-appb-000010
表示所述模型参数的第j个元素在第k+1次模型参数更新所对应的所述模型参数变化量,所述
Figure PCTCN2015076967-appb-000011
表示所述模型参数的第j个元素在第k+1次模型参数更新所对应的第一梯度,所述
Figure PCTCN2015076967-appb-000012
表示所述前一次模型参数的第j个元素在第k次模型参数更新所对应的梯度,所述k为大于零的整数,所述j大于或等于零的整数。
Said
Figure PCTCN2015076967-appb-000009
Representing the learning rate corresponding to the jth element of the model parameter in the k+1th model parameter update,
Figure PCTCN2015076967-appb-000010
Representing the model parameter change amount corresponding to the jth element of the model parameter corresponding to the k+1th model parameter update,
Figure PCTCN2015076967-appb-000011
Representing a first gradient corresponding to the jth element of the model parameter in the k+1th model parameter update,
Figure PCTCN2015076967-appb-000012
Representing a gradient corresponding to the kth element parameter update of the previous model parameter, the k being an integer greater than zero, and the j being greater than or equal to an integer of zero.
本发明实施例第三方面提供的图像检索系统,包括:An image retrieval system provided by the third aspect of the embodiments of the present invention includes:
图像训练装置,检索装置和图像数据库;Image training device, retrieval device and image database;
所述图像训练装置包括:计算单元,终止判定单元,梯度确定单元,速率更新单元以及参数更新单元;所述计算单元用于使用模型参数对目标函数进行迭代计算,所述目标函数为用于进行图像训练的代价函数;The image training device includes: a calculation unit, a termination determination unit, a gradient determination unit, a rate update unit, and a parameter update unit; the calculation unit is configured to iteratively calculate an objective function using a model parameter, wherein the objective function is for performing Cost function of image training;
所述终止判定单元用于判定所述迭代计算的结果是否满足终止条件,若否,则所述执行梯度确定单元以及所述速率更新单元;若是,则获取满足所述终止条件的所述迭代计算的结果对应的模型参数;所述梯度确定单元用于确定 所述目标函数在所述模型参数上的第一梯度;所述速率更新单元用于根据所述模型参数在所述目标函数中所表现出的参数分布特征更新学习速率;所述参数更新单元,用于根据所述学习速率和所述第一梯度更新所述模型参数,并触发所述计算单元及所述终止判定单元。所述检索装置用于根据所述图像训练装置确定的模型参数对输入的图像数据进行神经网络特征提取,并根据所述神经网络特征在所述图像数据库中进行图像检索,输出所述图像检索的结果。The termination determining unit is configured to determine whether the result of the iterative calculation satisfies a termination condition, and if not, the execution gradient determining unit and the rate update unit; if yes, acquiring the iterative calculation that satisfies the termination condition Corresponding model parameters; the gradient determination unit is used to determine a first gradient of the objective function on the model parameter; the rate update unit is configured to update a learning rate according to a parameter distribution feature exhibited by the model parameter in the objective function; the parameter update unit, And configured to update the model parameter according to the learning rate and the first gradient, and trigger the computing unit and the termination determining unit. The searching device is configured to perform neural network feature extraction on the input image data according to the model parameter determined by the image training device, and perform image retrieval in the image database according to the neural network feature, and output the image retrieval result.
结合第三方面,在第一种可能的实现方式中,所述速率更新单元具体用于:With reference to the third aspect, in a first possible implementation, the rate update unit is specifically configured to:
根据所述目标函数在前一次模型参数上的梯度,及所述第一梯度更新所述学习速率。The learning rate is updated based on a gradient of the objective function over a previous model parameter and the first gradient.
结合第三方面第一种可能的实现方式,在第二种可能的实现方式中,所述速率更新单元具体用于:In conjunction with the first possible implementation of the third aspect, in a second possible implementation, the rate update unit is specifically configured to:
对所述模型参数中每一个元素对应的学习速率进行更新,当对所述模型参数中的第j个元素进行处理时,根据如下公式对所述学习速率进行更新;Updating a learning rate corresponding to each element in the model parameter, and when processing the jth element in the model parameter, updating the learning rate according to the following formula;
Figure PCTCN2015076967-appb-000013
Figure PCTCN2015076967-appb-000013
所述
Figure PCTCN2015076967-appb-000014
表示所述模型参数的第j个元素在第k+1次模型参数更新所对应的所述学习速率,所述
Figure PCTCN2015076967-appb-000015
表示所述模型参数的第j个元素在第k+1次模型参数更新所对应的所述模型参数变化量,所述
Figure PCTCN2015076967-appb-000016
表示所述模型参数的第j个元素在第k+1次模型参数更新所对应的第一梯度,所述
Figure PCTCN2015076967-appb-000017
表示所述前一次模型参数的第j个元素在第k次模型参数更新所对应的梯度,所述k为大于零的整数,所述j大于或等于零的整数。
Said
Figure PCTCN2015076967-appb-000014
Representing the learning rate corresponding to the jth element of the model parameter in the k+1th model parameter update,
Figure PCTCN2015076967-appb-000015
Representing the model parameter change amount corresponding to the jth element of the model parameter corresponding to the k+1th model parameter update,
Figure PCTCN2015076967-appb-000016
Representing a first gradient corresponding to the jth element of the model parameter in the k+1th model parameter update,
Figure PCTCN2015076967-appb-000017
Representing a gradient corresponding to the kth element parameter update of the previous model parameter, the k being an integer greater than zero, and the j being greater than or equal to an integer of zero.
从以上技术方案可以看出,本发明实施例具有以下优点:It can be seen from the above technical solutions that the embodiments of the present invention have the following advantages:
在本发明实施例的迭代过程中,若迭代计算的结果不满足终止条件,则继续进行迭代计算,在进行下一次迭代计算之前,根据所述模型参数在所述目标函数所表现出的参数分布特征更新学习速率,再使用所述学习速率更新下一次迭代计算所使用的模型参数,使得模型参数的变化幅度可以根据目标函数的参数分布特征进行适应性调整,从而在远离模型参数的局部最优值时,可以通过 学习速率设置较大的模型参数的变化幅度,以加快迭代计算的进程,在接近模型参数的局部最优值时,可以通过更新学习速率设置较小的模型参数的变化幅度,提高了迭代计算的效率,进而在兼顾提高了进行图像训练的速度。In the iterative process of the embodiment of the present invention, if the result of the iterative calculation does not satisfy the termination condition, the iterative calculation is continued, and the parameter distribution represented by the target function according to the model parameter is performed before the next iteration calculation. The feature updates the learning rate, and then uses the learning rate to update the model parameters used in the next iteration calculation, so that the variation range of the model parameters can be adaptively adjusted according to the parameter distribution characteristics of the objective function, thereby local optimization away from the model parameters. When you can, you can pass The learning rate sets the magnitude of the variation of the model parameters to speed up the iterative calculation process. When the local optimal value of the model parameters is approached, the variation range of the model parameters can be set by updating the learning rate, and the iterative calculation is improved. Efficiency, and in turn, improves the speed of image training.
附图说明DRAWINGS
为了更清楚地说明本发明实施例或现有技术中的技术方案,下面将对实施例中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本发明的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings to be used in the embodiments will be briefly described below. Obviously, the drawings in the following description are only some of the present invention. For the embodiments, those skilled in the art can obtain other drawings according to the drawings without any creative work.
图1是本发明实施例中的图像检索装置的架构示意图;1 is a schematic structural diagram of an image retrieval apparatus in an embodiment of the present invention;
图2是本发明实施例中的模型参数训练方法的一个流程示意图;2 is a schematic flow chart of a model parameter training method in an embodiment of the present invention;
图3是本发明实施例中的模型参数训练方法的另一个流程示意图;3 is another schematic flowchart of a model parameter training method in an embodiment of the present invention;
图4是本发明实施例中的目标函数曲线的一个示意图;4 is a schematic diagram of an objective function curve in an embodiment of the present invention;
图5是本发明实施例中的目标函数曲线的另一个示意图;Figure 5 is another schematic diagram of an objective function curve in an embodiment of the present invention;
图6是本发明实施例中的目标函数曲线的另一个示意图;6 is another schematic diagram of an objective function curve in an embodiment of the present invention;
图7是本发明实施例中的目标函数曲线的另一个示意图;7 is another schematic diagram of an objective function curve in an embodiment of the present invention;
图8是本发明实施例中的收敛测试的一个示意图;8 is a schematic diagram of a convergence test in an embodiment of the present invention;
图9是本发明实施例中的图像训练装置的结构示意图;FIG. 9 is a schematic structural diagram of an image training apparatus according to an embodiment of the present invention; FIG.
图10是本发明实施例中基于模型参数训练方法的图像训练装置的计算机结构示意图。FIG. 10 is a schematic diagram showing the computer structure of an image training apparatus based on a model parameter training method according to an embodiment of the present invention.
具体实施方式detailed description
下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅是本发明一部分实施例,而不是全部的实施例。基于本发明中的实施例,本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例,都属于本发明保护的范围。The technical solutions in the embodiments of the present invention are clearly and completely described in the following with reference to the accompanying drawings in the embodiments of the present invention. It is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments obtained by those skilled in the art based on the embodiments of the present invention without creative efforts are within the scope of the present invention.
请参阅图1,本发明实施例中模型参数训练方法应用于图1所示图像检索系统,具体的: Referring to FIG. 1 , a model parameter training method in the embodiment of the present invention is applied to the image retrieval system shown in FIG. 1 , specifically:
在实际应用中,为了使计算机在检索时可以输出人类想要的结果,需要计算机装置进行深度学习,以建立、模拟人脑进行分析学习的神经网络,它模仿人脑的机制来解释数据,图像训练装置通过多个变换、表达步骤来模仿人脑的深层学习结构,通过探索深层结构,可以从数据中学习得到逐渐抽象的层次化特征。因此,为了实现深度学习,则需要在图像检索系统中会设置图像训练装置11,进行海量数据的训练,确定进行图像检索的模型参数。In practical applications, in order to enable a computer to output human-desired results when searching, a computer device is required for deep learning to establish and simulate a neural network for human brain analysis and learning, which mimics the mechanism of the human brain to interpret data and images. The training device simulates the deep learning structure of the human brain through multiple transformation and expression steps. By exploring the deep structure, it is possible to learn from the data to obtain gradually abstract hierarchical features. Therefore, in order to realize deep learning, it is necessary to provide the image training device 11 in the image retrieval system, perform training of mass data, and determine model parameters for image retrieval.
在用户需要进行图像检索时,在图像检索系统的检索装置12中输入图像数据,检索装置12根据图像训练装置11确定的模型参数对所述图像数据进行神经网络特征提取,并根据所述神经网络特征在所述图像数据库13中进行图像的对比查找,输出所述图像检索的结果,具体的,可以根据图像的相似性以降序的方式输出所述图像检索的结果。When the user needs to perform image retrieval, the image data is input in the retrieval device 12 of the image retrieval system, and the retrieval device 12 performs neural network feature extraction on the image data according to the model parameters determined by the image training device 11, and according to the neural network. The feature performs a comparative search of the image in the image database 13, and outputs the result of the image retrieval. Specifically, the result of the image retrieval may be output in descending order according to the similarity of the image.
在图像训练装置11进行图像训练的过程中,首先将要解决的图像检索问题抽象成一个最优化问题,定义目标函数,然后通过相应的最优化算法来对其进行求解,求解目标就是找到一组最优的模型参数,使得模型在训练数据集上总的代价最小。In the process of image training by the image training device 11, the image retrieval problem to be solved is first abstracted into an optimization problem, the objective function is defined, and then solved by a corresponding optimization algorithm, and the solution target is to find a group of the most Excellent model parameters make the model the least costly on the training data set.
现有技术中,可以使用随机梯度下降法求解最优的模型参数,但是,图像训练的速度不理想,本发明实施例中模型参数训练方法针对该随机梯度下降法进行优化改进,具体请参阅以下实施:In the prior art, the stochastic gradient descent method can be used to solve the optimal model parameters. However, the speed of image training is not ideal. In the embodiment of the present invention, the model parameter training method is optimized and improved according to the stochastic gradient descent method. Implementation:
在实际应用中,在进行目标函数的迭代计算时,需要设定初始的模型参数以及初始的学习速率。具体的,除首次的进行迭代计算之外,当前迭代计算中使用的模型参数皆为前一次迭代计算之后更新的模型参数。为了便于描述,本发明实施例中,将当前迭代计算中使用的模型参数作为第一模型参数,将前一次迭代计算中使用的模型参数作为第二模型参数;将所述目标函数在所述第一模型参数上的梯度作为第一梯度;将所述目标函数在所述第二模型参数上的梯度作为第二梯度。In practical applications, when performing iterative calculation of the objective function, it is necessary to set the initial model parameters and the initial learning rate. Specifically, in addition to the first iterative calculation, the model parameters used in the current iterative calculation are the model parameters updated after the previous iteration calculation. For convenience of description, in the embodiment of the present invention, the model parameter used in the current iterative calculation is used as the first model parameter, and the model parameter used in the previous iterative calculation is used as the second model parameter; A gradient on a model parameter is used as a first gradient; a gradient of the objective function on the second model parameter is used as a second gradient.
首次进行迭代计算时,该初始的模型参数作为第一模型参数,当首次迭代计算不满足终止条件时,使用该初始的学习速率首次对初始的模型参数进行更新,将更新后的模型参数作为下一次迭代计算的第一模型参数;本发明实施例 的模型参数训练方法,都应用在所述“首次对初始的模型参数进行更新”之后的迭代计算中。When the iterative calculation is performed for the first time, the initial model parameters are used as the first model parameters. When the first iteration calculation does not satisfy the termination condition, the initial model parameters are updated for the first time using the initial learning rate, and the updated model parameters are taken as First iteratively calculated first model parameter; embodiment of the invention The model parameter training method is applied to the iterative calculation after the "first update of the initial model parameters".
请参阅图2,本发明实施例中模型参数训练方法的一个实施例包括:Referring to FIG. 2, an embodiment of a method for training a model parameter in an embodiment of the present invention includes:
201、使用第一模型参数对目标函数进行迭代计算;201. Perform an iterative calculation on the objective function using the first model parameter;
图像训练装置使用第一模型参数对目标函数进行迭代计算,所述目标函数为用于进行图像训练的代价函数。示例性的,以度量学习为例,定义w为卷积神经网络参数,输入图像x为3个图片构成的三元组,x={q,q+,q-},,其中(q,q+)为相似图像对,(q,q-)为不相似图像对。输入图像通过神经网络的映射为{φw(q),φw(q+),φw(q-)},其中φw(q),φw(q+),φw(q-)均为一维列向量,用作图像特征表述,则所述代价函数可以为:The image training device iteratively calculates an objective function using a first model parameter, which is a cost function for performing image training. Exemplarily, taking metric learning as an example, defining w as a convolutional neural network parameter, the input image x is a triple of three pictures, x={q,q + ,q - }, where (q,q + ) is a similar image pair, (q, q - ) is a dissimilar image pair. The mapping of the input image through the neural network is {φ w (q), φ w (q + ), φ w (q - )}, where φ w (q), φ w (q + ), φ w (q - ) All are one-dimensional column vectors, used as image feature representations, then the cost function can be:
l(w,x)=max(0,γ-φw(q)·φw(q+)+φw(q)·φw(q-));l(w,x)=max(0,γ-φ w (q)·φw(q + )+φ w (q)·φ w (q ));
可以理解的是,在实际应用中,所述代价函数还可以有其它表现形式,具体需要根据实际需求而定,此处不作限定。It can be understood that, in practical applications, the cost function may also have other representations, which need to be determined according to actual needs, which is not limited herein.
202、若所述迭代计算的结果不满足终止条件,则确定第一梯度并更新学习速率;202. If the result of the iterative calculation does not satisfy the termination condition, determine the first gradient and update the learning rate;
在图像训练装置使用第一模型参数对目标函数进行迭代计算之后,图像训练装置判断当前的迭代计算的结果是否满足终止条件,若否,则确定所述目标函数在所述模型参数上的第一梯度,并根据所述模型参数在所述目标函数中所表现出的参数分布特征更新学习速率;所述学习速率用于确定所述第一模型参数的更新幅度。After the image training device iteratively calculates the objective function using the first model parameter, the image training device determines whether the result of the current iterative calculation satisfies the termination condition, and if not, determines the first of the objective function on the model parameter Gradient, and updating the learning rate according to the parameter distribution characteristics exhibited by the model parameter in the objective function; the learning rate is used to determine an update range of the first model parameter.
具体的,“所述模型参数在所述目标函数中所表现出的参数分布特征”可以表示为目标函数的函数图像上相应的参数点的梯度变化。Specifically, the “parameter distribution feature exhibited by the model parameter in the objective function” may be expressed as a gradient change of a corresponding parameter point on a function image of the objective function.
具体的,在实际应用中,所述终止条件可以有多种表现形式,如,当第一模型参数在目标函数的结算结果满足某段数值范围时,所述迭代计算终止;又 如,当所述迭代计算的次数达到某个阈值时,所述迭代计算终止。可以理解的是,所述终止条件在实际应用中还可以有更多的表现形式,此处具体不作限定。Specifically, in practical applications, the termination condition may have multiple manifestations, for example, when the first model parameter meets a certain range of values in the settlement result of the objective function, the iterative calculation is terminated; For example, when the number of iteration calculations reaches a certain threshold, the iterative calculation is terminated. It can be understood that the termination condition may have more representations in practical applications, and is not specifically limited herein.
203、根据所述学习速率和所述第一梯度更新所述第一模型参数;203. Update the first model parameter according to the learning rate and the first gradient.
图像训练装置根据所述学习速率和所述第一梯度更新所述第一模型参数,具体的所述学习速率可以用于确定所述第一模型参数的更新幅度,所述第一梯度可以用于确定所述第一模型参数的更新方向。The image training device updates the first model parameter according to the learning rate and the first gradient, and the specific learning rate may be used to determine an update amplitude of the first model parameter, where the first gradient may be used Determining an update direction of the first model parameter.
第一模型参数的完成更新之后,再依次触发步骤201和步骤202,直至所述迭代计算的结果满足所述终止条件,则停止所述迭代计算,获取满足所述终止条件的第一模型参数。After the completion of the update of the first model parameter, step 201 and step 202 are triggered in sequence until the result of the iterative calculation satisfies the termination condition, then the iterative calculation is stopped, and the first model parameter that satisfies the termination condition is acquired.
在本发明实施例的迭代过程中,若迭代计算的结果不满足终止条件,则继续进行迭代计算,在进行下一次迭代计算之前,根据所述模型参数在所述目标函数中所表现出的参数分布特征更新学习速率,再使用所述学习速率更新下一次迭代计算所使用的模型参数,使得模型参数的变化幅度可以根据目标函数的参数分布特征进行适应性调整,从而在远离模型参数的局部最优值时,可以通过学习速率设置较大的模型参数的变化幅度,以加快迭代计算的进程,在接近模型参数的局部最优值时,可以通过更新学习速率设置较小的模型参数的变化幅度,提高了迭代计算的效率,进而在兼顾提高了进行图像训练的速度。In the iterative process of the embodiment of the present invention, if the result of the iterative calculation does not satisfy the termination condition, the iterative calculation is continued, and the parameter represented by the model parameter in the objective function is performed before the next iterative calculation. The distribution feature updates the learning rate, and then uses the learning rate to update the model parameters used in the next iteration calculation, so that the variation range of the model parameters can be adaptively adjusted according to the parameter distribution characteristics of the objective function, thereby being far away from the model parameters. When the value is excellent, the variation range of the model parameters can be set by the learning rate to speed up the iterative calculation process. When the local optimum value of the model parameters is approached, the variation range of the model parameters can be set by updating the learning rate. , improve the efficiency of iterative calculations, and thus improve the speed of image training.
在实际应用中,训练数据的迭代计算还可以使用牛顿法和拟牛顿法,但是计算过程中需要计算二阶偏导数和海森矩阵,计算复杂度高,有时目标函数的海森矩阵无法保持正定,从而令牛顿法或拟牛顿法失效。本发明实施例提出的模型参数确定方法不需要二阶导数的信息和计算或近似计算海森矩阵,所以比牛顿法和拟牛顿法更为高效,可以用来解决其他无约束、约束或大规模的非线性优化问题。In practical applications, the iterative calculation of training data can also use Newton method and quasi-Newton method, but the second-order partial derivative and Hessian matrix need to be calculated in the calculation process, the computational complexity is high, and sometimes the Hessian matrix of the objective function cannot maintain positive definite Thus, the Newton method or the quasi-Newton method is invalidated. The model parameter determination method proposed by the embodiment of the present invention does not require the information of the second derivative and the calculation or approximate calculation of the Hessian matrix, so it is more efficient than the Newton method and the quasi-Newton method, and can be used to solve other unconstrained, constrained or large-scale Nonlinear optimization problem.
下面对本发明实施例中的模型参数确定方法进行详细描述,在本发明实施例中,下标k表示当前正在进行所述迭代计算对应的参数,上标j表示与所述第一模型参数中第j个元素对应的参数,请参阅图3,本发明实施例中模型参数确定方法的另一个实施例包括: The method for determining a model parameter in the embodiment of the present invention is described in detail below. In the embodiment of the present invention, the subscript k indicates that the parameter corresponding to the iterative calculation is currently being performed, and the superscript j indicates that the parameter is the same as the first model parameter. For the parameters corresponding to the j elements, referring to FIG. 3, another embodiment of the method for determining a model parameter in the embodiment of the present invention includes:
301、使用第一模型参数对目标函数进行迭代计算;301. Perform an iterative calculation on the objective function using the first model parameter;
图像训练装置使用第一模型参数对目标函数进行迭代计算,所述目标函数为用于进行图像训练的代价函数。The image training device iteratively calculates an objective function using a first model parameter, which is a cost function for performing image training.
302、判断所述迭代计算的结果是否满足终止条件;302. Determine whether the result of the iterative calculation satisfies a termination condition.
在图像训练装置使用第一模型参数对目标函数进行迭代计算之后,图像训练装置判断当前的迭代计算的结果是否满足终止条件,若是,停止所述迭代计算,获取满足所述终止条件的第一模型参数;若否,则执行步骤303。After the image training device iteratively calculates the objective function using the first model parameter, the image training device determines whether the result of the current iterative calculation satisfies the termination condition, and if so, stops the iterative calculation, and acquires the first model that satisfies the termination condition. Parameter; if no, step 303 is performed.
具体的,在实际应用中,所述终止条件可以有多种表现形式,如,当第一模型参数在目标函数的结算结果满足某段数值范围时,所述迭代计算终止;又如,当所述迭代计算的次数达到某个阈值时,所述迭代计算终止。可以理解的是,所述终止条件在实际应用中还可以有更多的表现形式,此处具体不作限定。Specifically, in practical applications, the termination condition may have multiple manifestations, for example, when the first model parameter satisfies a certain range of values in the settlement result of the objective function, the iterative calculation is terminated; The iterative calculation is terminated when the number of iterative calculations reaches a certain threshold. It can be understood that the termination condition may have more representations in practical applications, and is not specifically limited herein.
303、确定第一梯度并更新学习速率;303. Determine a first gradient and update a learning rate.
图像训练装置确定所述目标函数在所述模型参数上的第一梯度,并根据所述模型参数在所述目标函数中所表现出的参数分布特征更新学习速率;所述学习速率用于确定所述第一模型参数的更新幅度。An image training device determines a first gradient of the objective function on the model parameter and updates a learning rate according to a parameter distribution characteristic of the model parameter in the objective function; the learning rate is used to determine The update range of the first model parameter is described.
具体的,所述根据所述模型参数在所述目标函数中所表现出的参数分布特征更新学习速率,包括:Specifically, the updating the learning rate according to the parameter distribution feature that is displayed in the objective function according to the model parameter includes:
根据所述目标函数在前一次模型参数上的梯度,及所述第一梯度更新所述学习速率。The learning rate is updated based on a gradient of the objective function over a previous model parameter and the first gradient.
示例性的,在计算目标函数L(w)在第一模型参数wk处的梯度值gk具体可以为:Exemplarily, the gradient value g k at the first model parameter w k in calculating the objective function L(w) may specifically be:
gk=L′(wk)g k =L'(w k )
示例性的,所述根据第二梯度、模型参数变化量及所述第一梯度更新学习速率,具体为:Exemplarily, the learning rate is updated according to the second gradient, the model parameter variation, and the first gradient, specifically:
对所述第一模型参数中每一个元素对应的学习速率进行更新,当对所述第一模型参数中的第j个元素进行处理时,根据公式一对所述学习速率进行更新;Updating a learning rate corresponding to each element in the first model parameter, and when processing the jth element in the first model parameter, updating the learning rate according to a pair of the learning rates;
所述公式一为: The formula one is:
Figure PCTCN2015076967-appb-000018
Figure PCTCN2015076967-appb-000018
所述表示所述模型参数的第j个元素在第k+1次模型参数更新所对应的所述学习速率,所述|表示所述模型参数的第j个元素在第k+1次模型参数更新所对应的所述模型参数变化量,所述表示所述模型参数的第j个元素在第k+1次模型参数更新所对应的第一梯度,所述表示所述前一次模型参数的第j个元素在第k次模型参数更新所对应的梯度,所述k为大于零的整数,所述j大于或等于零的整数。The learning rate corresponding to the jth element of the model parameter is updated in the k+1th model parameter, and the | represents the jth element of the model parameter in the k+1th model parameter Updating the model parameter change amount corresponding to the first gradient corresponding to the jth element of the model parameter in the k+1th model parameter update, where the previous model parameter is The gradient of the j elements corresponding to the kth model parameter update, the k being an integer greater than zero, and the j being greater than or equal to an integer of zero.
具体的,对于模型参数中的一个元素而言,所述模型参数变化量为所述第一模型参数中的元素与所述第二模型参数中相应次序或位置的元素的差,再取绝对值。Specifically, for one element in the model parameter, the model parameter change amount is a difference between an element in the first model parameter and an element in a corresponding order or position in the second model parameter, and then takes an absolute value. .
下面对所述公式一的推导过程进行详细的描述:The derivation process of the formula 1 is described in detail below:
在实际应用中,随机梯度下降法中第一模型参数的更新方法为:In practical applications, the method for updating the first model parameters in the stochastic gradient descent method is:
公式二:wk+1=wkkgkFormula 2: w k+1 = w k - η k g k ;
对所述公式二进行变形,可以得到公式三,模型参数w的模型参数变化量Δwk为:By deforming the formula 2, the formula 3 can be obtained, and the model parameter variation Δw k of the model parameter w is:
公式三:Δwk=wk+1-wk=-ηkgkEquation 3: Δw k =w k+1 -w k =-η k g k ;
由于所述Δw的变化是连续性的,因此,学习速率ηk和前一次迭代计算的模型参数变化量的绝对值|Δwk-1|成比例,其关系式为,其中 Since the change of the Δw is continuous, the learning rate η k is proportional to the absolute value |Δw k-1 | of the variation of the model parameter calculated in the previous iteration, and the relationship is
公式四:ηk=λk|wk-wk-1|=λk=|Δwk-1|;Equation 4: η kk |w k -w k-1 |=λ k =|Δw k-1 |;
其中,所述λk为学习速率与模型参数变化量之间的比例参数。Wherein λ k is a proportional parameter between the learning rate and the amount of change of the model parameter.
根据所述公式三和公式四可以得到Δwk与λk的关系:According to the formula 3 and formula 4, the relationship between Δw k and λ k can be obtained:
公式五:Δwk=-λk|Δwk-1|gkEquation 5: Δw k = - λ k | Δw k-1 | g k ;
进一步的,基于所述公式五可得:Further, based on the formula five:
公式六:wk+1=wk+Δwk=wkk|Δwk-1|gkEquation 6: w k+1 = w k + Δw k = w k - λ k | Δw k-1 | g k ;
当需要对所述第一模型参数中的第j个元素进行处理时,则根据公式六换算可得:When it is required to process the jth element in the first model parameter, it is obtained according to the formula six conversion:
公式七:
Figure PCTCN2015076967-appb-000019
Formula seven:
Figure PCTCN2015076967-appb-000019
将所述公式七代入所述公式五可得:Substituting the formula into the formula five can be obtained:
公式八:
Figure PCTCN2015076967-appb-000020
Formula eight:
Figure PCTCN2015076967-appb-000020
结合公式八和公式三,可以学习速率ηk的关系式为:Combining Equation 8 and Equation 3, the relationship of the rate η k can be learned as:
公式一:
Figure PCTCN2015076967-appb-000021
Formula one:
Figure PCTCN2015076967-appb-000021
304、根据所述学习速率和所述第一梯度更新所述第一模型参数;304. Update the first model parameter according to the learning rate and the first gradient.
图像训练装置根据所述学习速率和所述第一梯度更新所述第一模型参数,具体的所述学习速率可以用于确定所述第一模型参数的更新幅度,所述第一梯度可以用于确定所述第一模型参数的更新方向。The image training device updates the first model parameter according to the learning rate and the first gradient, and the specific learning rate may be used to determine an update amplitude of the first model parameter, where the first gradient may be used Determining an update direction of the first model parameter.
第一模型参数的完成更新之后,再依次触发步骤301和步骤302,直至所述迭代计算的结果满足所述终止条件,则停止所述迭代计算,获取满足所述终止条件的第一模型参数。After the completion of the update of the first model parameter, step 301 and step 302 are triggered in sequence until the result of the iterative calculation satisfies the termination condition, then the iterative calculation is stopped, and the first model parameter that satisfies the termination condition is acquired.
下面以一维(即j=1,多维可以此类推)的情况为例对本发明实施例中学习速率的更新的有效性进行分析:The following is an example of analyzing the validity of the update of the learning rate in the embodiment of the present invention by taking the case of one-dimensional (ie, j=1, multi-dimensional, etc.) as an example:
首先,当
Figure PCTCN2015076967-appb-000022
时,由式所述公式八可知:
First of all, when
Figure PCTCN2015076967-appb-000022
At the time, the formula 8 of the formula is known:
公式九:
Figure PCTCN2015076967-appb-000023
Formula nine:
Figure PCTCN2015076967-appb-000023
当|gk|=|gk-1|时,请参阅图4,A点为第k-1次迭代计算所对应的参数点,B点为第k次迭代计算所对应的参数点,C点为目标函数中的一个局部最优值对应的参数点。根据所述公式九可得:
Figure PCTCN2015076967-appb-000024
这可使下一步迭代(第k+1次迭代)刚好落在点A和点B的中间,刚好自适应地接近局部最优参数点C。
When |g k |=|g k-1 |, refer to Figure 4, point A is the parameter point corresponding to the k-1th iteration calculation, and point B is the parameter point corresponding to the kth iteration calculation, C The point is a parameter point corresponding to a local optimum value in the objective function. According to the formula nine can be obtained:
Figure PCTCN2015076967-appb-000024
This allows the next iteration (k+1th iteration) to fall just in the middle of point A and point B, just adaptively approaching the local optimal parameter point C.
当|gk|<|gk-1|时,请参阅图5,A点为第k-1次迭代计算所对应的参数点,B点为第k次迭代计算所对应的参数点,C点为目标函数中的一个局部最优值对应的参数点。根据所述公式九可得:|gk|<|gk-1|,这可使下一步迭代(第k+1次迭代)落在点A和点B之间靠近B点处,自适应地接近局部最优参数点C。 When |g k |<|g k-1 |, please refer to Figure 5, point A is the parameter point corresponding to the k-1th iteration calculation, and point B is the parameter point corresponding to the kth iteration calculation, C The point is a parameter point corresponding to a local optimum value in the objective function. According to the formula 9, we can get: |g k |<|g k-1 |, which can make the next iteration (k+1th iteration) fall between point A and point B near point B, adaptive The ground is close to the local optimal parameter point C.
当|gk|>|gk-1|时,请参阅图6,A点为第k-1次迭代计算所对应的参数点,B点为第k次迭代计算所对应的参数点,C点为目标函数中的一个局部最优值对应的参数点。根据所述公式九可得:
Figure PCTCN2015076967-appb-000025
这可使下一步迭代(第k+1次迭代)落在点A和点B之间靠近A点处,自适应地接近局部最优参数点C。
When |g k |>|g k-1 |, refer to Figure 6, point A is the parameter point corresponding to the k-1th iteration, and point B is the parameter point corresponding to the kth iteration calculation, C The point is a parameter point corresponding to a local optimum value in the objective function. According to the formula nine can be obtained:
Figure PCTCN2015076967-appb-000025
This allows the next iteration (k+1th iteration) to fall between point A and point B near point A, adaptively approaching the local optimal parameter point C.
其次,当gk×gk-1>0时,由式所述公式八可知:Secondly, when g k ×g k-1 >0, it can be known from the formula 8 of the formula:
公式十:
Figure PCTCN2015076967-appb-000026
Formula ten:
Figure PCTCN2015076967-appb-000026
请参阅图7,A点为第k-1次迭代计算所对应的参数点,B点为第k次迭代计算所对应的参数点,C点为目标函数中的一个局部最优值对应的参数点。根据所述公式十可得:|gk-1|-|gk|的绝对值越大,即当前梯度和前一次梯度的变化越大,
Figure PCTCN2015076967-appb-000027
的值越小,从而使学习速率自适应地减小;反之亦然。
Referring to FIG. 7, point A is the parameter point corresponding to the k-1th iteration calculation, point B is the parameter point corresponding to the kth iteration calculation, and point C is the parameter corresponding to a local optimum value in the objective function. point. According to the formula ten, the larger the absolute value of |g k-1 |-|g k |, the larger the change of the current gradient and the previous gradient,
Figure PCTCN2015076967-appb-000027
The smaller the value, the less the learning rate is adaptively reduced, and vice versa.
为验证本发明实施例中模型参数训练方法的有效性,在巴黎数据集上进行了图像检索实验。该数据集共有6,412幅图片,包含巴黎11个标志性建筑(地标)。其中每个地标选出5幅图像用作查询。首先在ImageNet数据集上提取CNNs特征,然后在巴黎数据集上,利用SGD和本发明方法进行学习和调整(模型调优)。由于该模型包含约6千万个参数,因此牛顿法和拟牛顿法均无法用于模型训练。因此,在实验中仅对比了本发明方法和目前广泛使用的SGD方法。比较了SGD和提出的方法在模型调优中的收敛速度,以及调优后学习 模型在图像检索任务中平均准确率(mAP)。In order to verify the effectiveness of the model parameter training method in the embodiment of the present invention, an image retrieval experiment was performed on the Paris data set. The data set has 6,412 images and contains 11 landmarks (landmarks) in Paris. Each of the landmarks selected 5 images for use as a query. The CNNs feature is first extracted on the ImageNet dataset, and then learned and adjusted (model tuning) using the SGD and the method of the present invention on the Paris dataset. Since the model contains about 60 million parameters, neither Newton nor quasi-Newton methods can be used for model training. Therefore, only the method of the present invention and the currently widely used SGD method were compared in the experiment. Compare the convergence speed of SGD and the proposed method in model tuning, and learn after tuning. The average accuracy (mAP) of the model in the image retrieval task.
图8是SGD算法和本发明实施例中模型参数训练方法在模型调优时训练的收敛速度对比。由于训练使用随机抽取的3元组,损失函数波动比较大,取了最近一百次迭代的平均,以平滑收敛曲线。可以看到,本发明实施例中模型参数训练方法的收敛速度显著快于SGD算法,而且本发明实施例中模型参数训练方法的迭代误差(hinge loss)远低于SGD,在迭代10000次时的误差已经达到了SGD的最终(10万次)收敛误差(0.0125),也就是在相同误差终止条件下,本发明实施例中模型参数训练方法提高了10倍速度。FIG. 8 is a comparison of the convergence speed of the training of the model parameter training method in the SGD algorithm and the embodiment of the present invention. Since the training uses a randomly selected 3-tuple, the loss function fluctuates greatly, taking the average of the last hundred iterations to smooth the convergence curve. It can be seen that the convergence speed of the model parameter training method in the embodiment of the present invention is significantly faster than the SGD algorithm, and the iterative error of the model parameter training method in the embodiment of the present invention is much lower than that of the SGD, and the iteration is 10000 times. The error has reached the final (100,000 times) convergence error of SGD (0.0125), that is, under the same error termination condition, the model parameter training method in the embodiment of the present invention is increased by 10 times.
下面对实现本发明实施例中的模型参数训练方法的图像训练装置进行描述,需要说明的是,上述模型参数训练方法各实施例中所记载的方法可实施于本发明的图像训练装置。请参阅图9,本发明实施例中的图像训练装置的一个实施例包括:The image training device for implementing the model parameter training method in the embodiment of the present invention is described below. It should be noted that the method described in each embodiment of the model parameter training method can be implemented in the image training device of the present invention. Referring to FIG. 9, an embodiment of an image training apparatus in an embodiment of the present invention includes:
计算单元901,终止判定单元902,梯度确定单元903,速率更新单元904以及参数更新单元905; Computing unit 901, termination determining unit 902, gradient determining unit 903, rate updating unit 904 and parameter updating unit 905;
所述计算单元901用于使用模型参数对目标函数进行迭代计算,所述目标函数为用于进行图像训练的代价函数;The calculating unit 901 is configured to perform an iterative calculation on the objective function using a model parameter, where the objective function is a cost function for performing image training;
所述终止判定单元902用于判定所述迭代计算的结果是否满足终止条件,若否,则所述执行梯度确定单元903以及所述速率更新单元904;若是,则获取满足所述终止条件的所述迭代计算的结果对应的模型参数;The termination determining unit 902 is configured to determine whether the result of the iterative calculation satisfies a termination condition, and if not, the execution gradient determining unit 903 and the rate updating unit 904; if yes, acquiring a location that satisfies the termination condition The model parameters corresponding to the results of the iterative calculation;
所述梯度确定单元903用于确定所述目标函数在所述模型参数上的第一梯度;The gradient determining unit 903 is configured to determine a first gradient of the objective function on the model parameter;
所述速率更新单元904用于根据所述模型参数在所述目标函数中所表现出的参数分布特征更新学习速率;The rate update unit 904 is configured to update a learning rate according to a parameter distribution feature that is displayed in the target function by the model parameter;
所述参数更新单元905,用于根据所述学习速率和所述第一梯度更新所述 模型参数,并触发所述计算单元901及所述终止判定单元902。The parameter updating unit 905 is configured to update the according to the learning rate and the first gradient The model parameters are triggered, and the calculation unit 901 and the termination determination unit 902 are triggered.
进一步的,所述速率更新单元904具体用于:Further, the rate update unit 904 is specifically configured to:
根据所述目标函数在前一次模型参数上的梯度,及所述第一梯度更新所述学习速率。The learning rate is updated based on a gradient of the objective function over a previous model parameter and the first gradient.
在更新学习速率的过程中,对所述第一模型参数中每一个元素对应的学习速率进行更新,当对所述第一模型参数中的第j个元素进行处理时,根据公式一对所述学习速率进行更新;Updating a learning rate corresponding to each element in the first model parameter during a process of updating a learning rate, and when processing the jth element in the first model parameter, according to a formula Learning rate is updated;
所述公式一为:The formula one is:
Figure PCTCN2015076967-appb-000028
Figure PCTCN2015076967-appb-000028
所述
Figure PCTCN2015076967-appb-000029
表示所述模型参数的第j个元素在第k+1次模型参数更新所对应的所述学习速率,所述
Figure PCTCN2015076967-appb-000030
表示所述模型参数的第j个元素在第k+1次模型参数更新所对应的所述模型参数变化量,所述
Figure PCTCN2015076967-appb-000031
表示所述模型参数的第j个元素在第k+1次模型参数更新所对应的第一梯度,所述
Figure PCTCN2015076967-appb-000032
表示所述前一次模型参数的第j个元素在第k次模型参数更新所对应的梯度,所述k为大于零的整数,所述j大于或等于零的整数。
Said
Figure PCTCN2015076967-appb-000029
Representing the learning rate corresponding to the jth element of the model parameter in the k+1th model parameter update,
Figure PCTCN2015076967-appb-000030
Representing the model parameter change amount corresponding to the jth element of the model parameter corresponding to the k+1th model parameter update,
Figure PCTCN2015076967-appb-000031
Representing a first gradient corresponding to the jth element of the model parameter in the k+1th model parameter update,
Figure PCTCN2015076967-appb-000032
Representing a gradient corresponding to the kth element parameter update of the previous model parameter, the k being an integer greater than zero, and the j being greater than or equal to an integer of zero.
下面对本发明实施例中各个单元的工作流程进行描述:The following describes the workflow of each unit in the embodiment of the present invention:
计算单元901使用第一模型参数对目标函数进行迭代计算,所述目标函数为用于进行图像训练的代价函数。The calculation unit 901 performs an iterative calculation on the objective function using the first model parameter, which is a cost function for performing image training.
在使用第一模型参数对目标函数进行迭代计算之后,终止判定单元902判断当前的迭代计算的结果是否满足终止条件,若否,则执行梯度确定单元903和速率更新单元904。After iteratively calculating the objective function using the first model parameter, the termination determination unit 902 determines whether the result of the current iteration calculation satisfies the termination condition, and if not, executes the gradient determination unit 903 and the rate update unit 904.
具体的,在实际应用中,所述终止条件可以有多种表现形式,如,当第一模型参数在目标函数的结算结果满足某段数值范围时,所述迭代计算终止;又如,当所述迭代计算的次数达到某个阈值时,所述迭代计算终止。可以理解的是,所述终止条件在实际应用中还可以有更多的表现形式,此处具体不作限定。Specifically, in practical applications, the termination condition may have multiple manifestations, for example, when the first model parameter satisfies a certain range of values in the settlement result of the objective function, the iterative calculation is terminated; The iterative calculation is terminated when the number of iterative calculations reaches a certain threshold. It can be understood that the termination condition may have more representations in practical applications, and is not specifically limited herein.
梯度确定单元903根据所述目标函数确定第一梯度,所述第一梯度为所述 目标函数在所述第一模型参数处的梯度。示例性的,在计算目标函数L(w)在第一模型参数wk处的梯度值gk具体可以为:The gradient determining unit 903 determines a first gradient based on the objective function, the first gradient being a gradient of the objective function at the first model parameter. Exemplarily, the gradient value g k at the first model parameter w k in calculating the objective function L(w) may specifically be:
gk=L′(wk)g k =L'(w k )
速率更新单元904根据所述模型参数在所述目标函数中所表现出的参数分布特征更新学习速率,所述学习速率用于确定所述第一模型参数的更新幅度。The rate update unit 904 updates the learning rate according to the parameter distribution features exhibited by the model parameters in the objective function, the learning rate being used to determine the update magnitude of the first model parameter.
具体的,所述根据所述模型参数在所述目标函数中所表现出的参数分布特征更新学习速率,包括:Specifically, the updating the learning rate according to the parameter distribution feature that is displayed in the objective function according to the model parameter includes:
根据所述目标函数在前一次模型参数上的梯度,及所述第一梯度更新所述学习速率。The learning rate is updated based on a gradient of the objective function over a previous model parameter and the first gradient.
示例性的,所述根据第二梯度、模型参数变化量及所述第一梯度更新学习速率,具体为:Exemplarily, the learning rate is updated according to the second gradient, the model parameter variation, and the first gradient, specifically:
对所述第一模型参数中每一个元素对应的学习速率进行更新,当对所述第一模型参数中的第j个元素进行处理时,根据公式一对所述学习速率进行更新;所述公式一为:Updating a learning rate corresponding to each element in the first model parameter, and when processing the jth element in the first model parameter, updating according to a pair of the learning rates; the formula One is:
Figure PCTCN2015076967-appb-000033
Figure PCTCN2015076967-appb-000033
参数更新单元905根据所述学习速率和所述第一梯度更新所述第一模型参数,具体的所述学习速率可以用于确定所述第一模型参数的更新幅度,所述第一梯度可以用于确定所述第一模型参数的更新方向。The parameter updating unit 905 updates the first model parameter according to the learning rate and the first gradient, and the specific learning rate may be used to determine an update range of the first model parameter, where the first gradient may be used. And determining an update direction of the first model parameter.
第一模型参数的完成更新之后,再次触发计算单元901,使用更新后的第一模型参数对所述目标函数继续进行所述迭代计算,直至所述迭代计算的结果满足所述终止条件,则停止所述迭代计算,获取满足所述终止条件的第一模型参数。After the completion of the update of the first model parameter, the calculation unit 901 is triggered again, and the iterative calculation is continued on the target function using the updated first model parameter until the result of the iterative calculation satisfies the termination condition, and then stops. The iterative calculation acquires a first model parameter that satisfies the termination condition.
图10是本发明实施例图像训练装置20的结构示意图。图像训练装置20可包括输入设备210、输出设备220、处理器230和存储器240。 FIG. 10 is a schematic structural diagram of an image training device 20 according to an embodiment of the present invention. The image training device 20 can include an input device 210, an output device 220, a processor 230, and a memory 240.
本发明实施例提供的图像训练装置20应用于流计算系统,所述流计算系统用于调度并处理业务,所述流计算系统包括主控节点与多个工作节点;所述主控节点用于将所述业务包含的各个子业务调度到所述多个工作节点进行处理。The image training device 20 provided by the embodiment of the present invention is applied to a stream computing system, where the stream computing system is configured to process and process a service, where the stream computing system includes a master node and a plurality of working nodes; Scheduling each sub-service included in the service to the plurality of work nodes for processing.
存储器240可以包括只读存储器和随机存取存储器,并向处理器230提供指令和数据。存储器240的一部分还可以包括非易失性随机存取存储器(NVRAM)。Memory 240 can include read only memory and random access memory and provides instructions and data to processor 230. A portion of memory 240 may also include non-volatile random access memory (NVRAM).
存储器240存储了如下的元素,可执行模块或者数据结构,或者它们的子集,或者它们的扩展集:The memory 240 stores the following elements, executable modules or data structures, or a subset thereof, or an extended set thereof:
操作指令:包括各种操作指令,用于实现各种操作。Operation instructions: include various operation instructions for implementing various operations.
操作系统:包括各种系统程序,用于实现各种基础业务以及处理基于硬件的任务。Operating system: Includes a variety of system programs for implementing various basic services and handling hardware-based tasks.
在本发明实施例中,处理器230通过调用存储器240存储的操作指令(该操作指令可存储在操作系统中),执行如下操作:In the embodiment of the present invention, the processor 230 performs the following operations by calling an operation instruction stored in the memory 240 (the operation instruction can be stored in the operating system):
所述处理器330具体用于使用第一模型参数对目标函数进行迭代计算,所述目标函数为用于进行图像训练的代价函数;若所述迭代计算的结果不满足终止条件,则确定所述目标函数在所述模型参数上的第一梯度,并根据所述模型参数在所述目标函数中所表现出的参数分布特征更新学习速率;根据所述学习速率和所述第一梯度更新所述第一模型参数;重复上述步骤,直至所述迭代计算的结果满足所述终止条件,获取满足所述终止条件的第一模型参数。The processor 330 is specifically configured to perform iterative calculation on the objective function using the first model parameter, where the objective function is a cost function for performing image training; if the result of the iterative calculation does not satisfy the termination condition, determining the a first gradient of the objective function on the model parameter, and updating a learning rate according to the parameter distribution characteristic exhibited by the model parameter in the objective function; updating the learning according to the learning rate and the first gradient The first model parameter; repeating the above steps until the result of the iterative calculation satisfies the termination condition, and acquiring a first model parameter that satisfies the termination condition.
具体的,所述根据所述模型参数在所述目标函数中所表现出的参数分布特征更新学习速率,包括:Specifically, the updating the learning rate according to the parameter distribution feature that is displayed in the objective function according to the model parameter includes:
根据所述目标函数在前一次模型参数上的梯度,及所述第一梯度更新所述学习速率。The learning rate is updated based on a gradient of the objective function over a previous model parameter and the first gradient.
处理器230控制图像训练装置20的操作,处理器230还可以称为CPU(Central Processing Unit,中央处理单元)。存储器240可以包括只读存储器和随机存取存储器,并向处理器230提供指令和数据。存储器240的一部分还可以包括非易失性随机存取存储器(NVRAM)。具体的应用中,图像训练装置20的各个组件通过总线系统250耦合在一起,其中总线系统250除包括数据总线之 外,还可以包括电源总线、控制总线和状态信号总线等。但是为了清楚说明起见,在图中将各种总线都标为总线系统250。The processor 230 controls the operation of the image training device 20, which may also be referred to as a CPU (Central Processing Unit). Memory 240 can include read only memory and random access memory and provides instructions and data to processor 230. A portion of memory 240 may also include non-volatile random access memory (NVRAM). In a particular application, the various components of image training device 20 are coupled together by a bus system 250, wherein bus system 250 includes, in addition to a data bus. In addition, it can also include a power bus, a control bus, and a status signal bus. However, for clarity of description, various buses are labeled as bus system 250 in the figure.
上述本发明实施例揭示的方法可以应用于处理器230中,或者由处理器230实现。处理器230可能是一种集成电路芯片,具有信号的处理能力。在实现过程中,上述方法的各步骤可以通过处理器230中的硬件的集成逻辑电路或者软件形式的指令完成。上述的处理器230可以是通用处理器、数字信号处理器(DSP)、专用集成电路(ASIC)、现成可编程门阵列(FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件。可以实现或者执行本发明实施例中的公开的各方法、步骤及逻辑框图。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。结合本发明实施例所公开的方法的步骤可以直接体现为硬件译码处理器执行完成,或者用译码处理器中的硬件及软件模块组合执行完成。软件模块可以位于随机存储器,闪存、只读存储器,可编程只读存储器或者电可擦写可编程存储器、寄存器等本领域成熟的存储介质中。该存储介质位于存储器240,处理器230读取存储器240中的信息,结合其硬件完成上述方法的步骤。The method disclosed in the foregoing embodiments of the present invention may be applied to the processor 230 or implemented by the processor 230. Processor 230 may be an integrated circuit chip with signal processing capabilities. In the implementation process, each step of the foregoing method may be completed by an integrated logic circuit of hardware in the processor 230 or an instruction in a form of software. The processor 230 described above may be a general purpose processor, a digital signal processor (DSP), an application specific integrated circuit (ASIC), an off-the-shelf programmable gate array (FPGA) or other programmable logic device, a discrete gate or transistor logic device, or discrete hardware. Component. The methods, steps, and logical block diagrams disclosed in the embodiments of the present invention may be implemented or carried out. The general purpose processor may be a microprocessor or the processor or any conventional processor or the like. The steps of the method disclosed in the embodiments of the present invention may be directly implemented by the hardware decoding processor, or may be performed by a combination of hardware and software modules in the decoding processor. The software module can be located in a conventional storage medium such as random access memory, flash memory, read only memory, programmable read only memory or electrically erasable programmable memory, registers, and the like. The storage medium is located in the memory 240, and the processor 230 reads the information in the memory 240 and performs the steps of the above method in combination with its hardware.
下面对实现本发明实施例中的模型参数训练方法的图像检索系统进行描述,需要说明的是,上述模型参数训练方法各实施例中所记载的方法可实施于本发明的图像检索系统。请参阅图1,本发明实施例中的图像检索系统的一个实施例包括:The image retrieval system for implementing the model parameter training method in the embodiment of the present invention is described below. It should be noted that the method described in each embodiment of the model parameter training method can be implemented in the image retrieval system of the present invention. Referring to FIG. 1, an embodiment of an image retrieval system in an embodiment of the present invention includes:
图像训练装置11,检索装置12和图像数据库13; Image training device 11, retrieval device 12 and image database 13;
所述图像训练装置11包括:计算单元,终止判定单元,梯度确定单元,速率更新单元以及参数更新单元;所述计算单元用于使用模型参数对目标函数进行迭代计算,所述目标函数为用于进行图像训练的代价函数;所述终止判定单元用于判定所述迭代计算的结果是否满足终止条件,若否,则所述执行梯度确定单元以及所述速率更新单元;若是,则获取满足所述终止条件的所述迭代计算的结果对应的模型参数;所述梯度确定单元用于确定所述目标函数在所述模型参数上的第一梯度;所述速率更新单元用于根据所述模型参数在所述目标函数中所表现出的参数分布特征更新学习速率;所述参数更新单元,用于根据所述学习速率和所述第一梯度更新所述模型参数,并触发所述计算单元及所述 终止判定单元。所述检索装置用于根据所述图像训练装置确定的模型参数对输入的图像数据进行神经网络特征提取,并根据所述神经网络特征在所述图像数据库中进行图像检索,输出所述图像检索的结果。The image training device 11 includes: a calculation unit, a termination determination unit, a gradient determination unit, a rate update unit, and a parameter update unit; the calculation unit is configured to iteratively calculate an objective function using a model parameter, the objective function being used for Performing a cost function of image training; the termination determination unit is configured to determine whether the result of the iterative calculation satisfies a termination condition, and if not, the execution gradient determination unit and the rate update unit; if yes, the acquisition satisfies the a model parameter corresponding to the result of the iterative calculation of the termination condition; the gradient determination unit is configured to determine a first gradient of the objective function on the model parameter; the rate update unit is configured to be based on the model parameter The parameter distribution feature displayed in the objective function updates a learning rate; the parameter updating unit is configured to update the model parameter according to the learning rate and the first gradient, and trigger the computing unit and the Terminate the decision unit. The searching device is configured to perform neural network feature extraction on the input image data according to the model parameter determined by the image training device, and perform image retrieval in the image database according to the neural network feature, and output the image retrieval result.
所述检索装置12用于根据所述图像训练装置确定的模型参数对输入的图像数据进行神经网络特征提取,并根据所述神经网络特征在所述图像数据库13中进行图像检索,输出所述图像检索的结果。The searching device 12 is configured to perform neural network feature extraction on the input image data according to the model parameters determined by the image training device, and perform image retrieval in the image database 13 according to the neural network feature, and output the image. The result of the search.
进一步的,所述速率更新单元具体用于:Further, the rate update unit is specifically configured to:
根据所述目标函数在前一次模型参数上的梯度,及所述第一梯度更新所述学习速率。The learning rate is updated based on a gradient of the objective function over a previous model parameter and the first gradient.
对所述第一模型参数中每一个元素对应的学习速率进行更新,当对所述第一模型参数中的第j个元素进行处理时,根据如下公式对所述学习速率进行更新;Updating a learning rate corresponding to each element in the first model parameter, and when processing the jth element in the first model parameter, updating the learning rate according to the following formula;
Figure PCTCN2015076967-appb-000034
Figure PCTCN2015076967-appb-000034
所述
Figure PCTCN2015076967-appb-000035
表示所述模型参数的第j个元素在第k+1次模型参数更新所对应的所述学习速率,所述
Figure PCTCN2015076967-appb-000036
表示所述模型参数的第j个元素在第k+1次模型参数更新所对应的所述模型参数变化量,所述
Figure PCTCN2015076967-appb-000037
表示所述模型参数的第j个元素在第k+1次模型参数更新所对应的第一梯度,所述
Figure PCTCN2015076967-appb-000038
表示所述前一次模型参数的第j个元素在第k次模型参数更新所对应的梯度,所述k为大于零的整数,所述j大于或等于零的整数。
Said
Figure PCTCN2015076967-appb-000035
Representing the learning rate corresponding to the jth element of the model parameter in the k+1th model parameter update,
Figure PCTCN2015076967-appb-000036
Representing the model parameter change amount corresponding to the jth element of the model parameter corresponding to the k+1th model parameter update,
Figure PCTCN2015076967-appb-000037
Representing a first gradient corresponding to the jth element of the model parameter in the k+1th model parameter update,
Figure PCTCN2015076967-appb-000038
Representing a gradient corresponding to the kth element parameter update of the previous model parameter, the k being an integer greater than zero, and the j being greater than or equal to an integer of zero.
本发明实施例中的图像检索系统的具体操作可以参考前述实施例,此处不再赘述。For the specific operation of the image retrieval system in the embodiment of the present invention, reference may be made to the foregoing embodiments, and details are not described herein again.
在本申请所提供的几个实施例中,应该理解到,所揭露的装置和方法可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通 信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。In the several embodiments provided herein, it should be understood that the disclosed apparatus and method can be implemented in other ways. For example, the device embodiments described above are merely illustrative. For example, the division of the unit is only a logical function division. In actual implementation, there may be another division manner, for example, multiple units or components may be combined or Can be integrated into another system, or some features can be ignored or not executed. Another point, the mutual coupling or direct coupling or communication shown or discussed The letter connection can be an indirect coupling or communication connection through some interface, device or unit, and can be in electrical, mechanical or other form.
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。The units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, may be located in one place, or may be distributed to multiple network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of the embodiment.
另外,在本发明各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用软件功能单元的形式实现。In addition, each functional unit in each embodiment of the present invention may be integrated into one processing unit, or each unit may exist physically separately, or two or more units may be integrated into one unit. The above integrated unit can be implemented in the form of hardware or in the form of a software functional unit.
所述集成的单元如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本发明的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的全部或部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本发明各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(ROM,Read-Only Memory)、随机存取存储器(RAM,Random Access Memory)、磁碟或者光盘等各种可以存储程序代码的介质。The integrated unit, if implemented in the form of a software functional unit and sold or used as a standalone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention, which is essential or contributes to the prior art, or all or part of the technical solution, may be embodied in the form of a software product stored in a storage medium. A number of instructions are included to cause a computer device (which may be a personal computer, server, or network device, etc.) to perform all or part of the steps of the methods described in various embodiments of the present invention. The foregoing storage medium includes: a U disk, a mobile hard disk, a read-only memory (ROM), a random access memory (RAM), a magnetic disk, or an optical disk, and the like. .
以上所述,仅为本发明的具体实施方式,但本发明的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本发明揭露的技术范围内,可轻易想到变化或替换,都应涵盖在本发明的保护范围之内。因此,本发明的保护范围应所述以权利要求的保护范围为准。 The above is only a specific embodiment of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art can easily think of changes or substitutions within the technical scope of the present invention. It should be covered by the scope of the present invention. Therefore, the scope of the invention should be determined by the scope of the claims.

Claims (9)

  1. 一种模型参数训练方法,其特征在于,包括:A model parameter training method, comprising:
    使用模型参数对目标函数进行迭代计算,所述目标函数为用于进行图像训练的代价函数,The objective function is iteratively calculated using model parameters, which are cost functions for image training,
    若所述迭代计算的结果不满足终止条件,If the result of the iterative calculation does not satisfy the termination condition,
    则确定所述目标函数在所述模型参数上的第一梯度,并根据所述模型参数在所述目标函数中所表现出的参数分布特征更新学习速率,Determining a first gradient of the objective function on the model parameter, and updating a learning rate according to a parameter distribution characteristic of the model parameter in the objective function,
    根据所述学习速率和所述第一梯度更新所述模型参数,Updating the model parameters according to the learning rate and the first gradient,
    重复上述步骤,直至所述迭代计算的结果满足所述终止条件,获取满足所述终止条件的所述迭代计算的结果对应的模型参数。The above steps are repeated until the result of the iterative calculation satisfies the termination condition, and the model parameters corresponding to the result of the iterative calculation satisfying the termination condition are acquired.
  2. 根据所述权利要求1所述的方法,其特征在于,所述根据所述模型参数在所述目标函数中所表现出的参数分布特征更新学习速率,包括:The method according to claim 1, wherein the updating the learning rate according to the parameter distribution feature displayed in the objective function by the model parameter comprises:
    根据所述目标函数在前一次模型参数上的梯度,及所述第一梯度更新所述学习速率。The learning rate is updated based on a gradient of the objective function over a previous model parameter and the first gradient.
  3. 根据所述权利要求2所述的方法,其特征在于,所述根据所述目标函数在前一次模型参数上的梯度,及所述第一梯度更新所述学习速率,包括:The method according to claim 2, wherein the updating the learning rate according to the gradient of the objective function on the previous model parameter and the first gradient comprises:
    对所述模型参数中每一个元素对应的学习速率进行更新,当对所述模型参数中的第j个元素进行处理时,根据如下公式对所述学习速率进行更新;Updating a learning rate corresponding to each element in the model parameter, and when processing the jth element in the model parameter, updating the learning rate according to the following formula;
    Figure PCTCN2015076967-appb-100001
    Figure PCTCN2015076967-appb-100001
    所述
    Figure PCTCN2015076967-appb-100002
    表示所述模型参数的第j个元素在第k+1次模型参数更新所对应的所述学习速率,所述
    Figure PCTCN2015076967-appb-100003
    表示所述模型参数的第j个元素在第k+1次模型参数更新所对应的所述模型参数变化量,所述
    Figure PCTCN2015076967-appb-100004
    表示所述模型参数的第j个元素在第k+1次模型参数更新所对应的第一梯度,所述
    Figure PCTCN2015076967-appb-100005
    表示所述前一次模型参数的第j个元素在第k次模型参数更新所对应的梯度,所述k为大于零的整数,所述j大于或等于零的整数。
    Said
    Figure PCTCN2015076967-appb-100002
    Representing the learning rate corresponding to the jth element of the model parameter in the k+1th model parameter update,
    Figure PCTCN2015076967-appb-100003
    Representing the model parameter change amount corresponding to the jth element of the model parameter corresponding to the k+1th model parameter update,
    Figure PCTCN2015076967-appb-100004
    Representing a first gradient corresponding to the jth element of the model parameter in the k+1th model parameter update,
    Figure PCTCN2015076967-appb-100005
    Representing a gradient corresponding to the kth element parameter update of the previous model parameter, the k being an integer greater than zero, and the j being greater than or equal to an integer of zero.
  4. 一种图像训练装置,其特征在于,包括: An image training device, comprising:
    计算单元,终止判定单元,梯度确定单元,速率更新单元以及参数更新单元;a calculation unit, a termination determination unit, a gradient determination unit, a rate update unit, and a parameter update unit;
    所述计算单元用于使用模型参数对目标函数进行迭代计算,所述目标函数为用于进行图像训练的代价函数;The computing unit is configured to iteratively calculate an objective function using a model parameter, the objective function being a cost function for performing image training;
    所述终止判定单元用于判定所述迭代计算的结果是否满足终止条件,若否,则所述执行梯度确定单元以及所述速率更新单元;若是,则获取满足所述终止条件的所述迭代计算的结果对应的模型参数;The termination determining unit is configured to determine whether the result of the iterative calculation satisfies a termination condition, and if not, the execution gradient determining unit and the rate update unit; if yes, acquiring the iterative calculation that satisfies the termination condition Corresponding model parameters;
    所述梯度确定单元用于确定所述目标函数在所述模型参数上的第一梯度;The gradient determining unit is configured to determine a first gradient of the objective function on the model parameter;
    所述速率更新单元用于根据所述模型参数在所述目标函数中所表现出的参数分布特征更新学习速率;The rate update unit is configured to update a learning rate according to a parameter distribution feature that is displayed in the objective function by the model parameter;
    所述参数更新单元,用于根据所述学习速率和所述第一梯度更新所述模型参数,并触发所述计算单元及所述终止判定单元。The parameter updating unit is configured to update the model parameter according to the learning rate and the first gradient, and trigger the calculating unit and the termination determining unit.
  5. 根据所述权利要求4所述的方法,其特征在于,所述速率更新单元具体用于:The method according to claim 4, wherein the rate update unit is specifically configured to:
    根据所述目标函数在前一次模型参数上的梯度,及所述第一梯度更新所述学习速率。The learning rate is updated based on a gradient of the objective function over a previous model parameter and the first gradient.
  6. 根据所述权利要求5所述的方法,其特征在于,所述速率更新单元具体用于:The method according to claim 5, wherein the rate update unit is specifically configured to:
    对所述模型参数中每一个元素对应的学习速率进行更新,当对所述模型参数中的第j个元素进行处理时,根据如下公式对所述学习速率进行更新;Updating a learning rate corresponding to each element in the model parameter, and when processing the jth element in the model parameter, updating the learning rate according to the following formula;
    Figure PCTCN2015076967-appb-100006
    Figure PCTCN2015076967-appb-100006
    所述
    Figure PCTCN2015076967-appb-100007
    表示所述模型参数的第j个元素在第k+1次模型参数更新所对应的所述学习速率,所述
    Figure PCTCN2015076967-appb-100008
    表示所述模型参数的第j个元素在第k+1次模型参数更新所对应的所述模型参数变化量,所述
    Figure PCTCN2015076967-appb-100009
    表示所述模型参数的第j个元素在第k+1次模型参数更新所对应的第一梯度,所述
    Figure PCTCN2015076967-appb-100010
    表示所述前一次模型参数的第j个元素在第k次模型参数更新所对应的梯度,所述k为大于零的整 数,所述j大于或等于零的整数。
    Said
    Figure PCTCN2015076967-appb-100007
    Representing the learning rate corresponding to the jth element of the model parameter in the k+1th model parameter update,
    Figure PCTCN2015076967-appb-100008
    Representing the model parameter change amount corresponding to the jth element of the model parameter corresponding to the k+1th model parameter update,
    Figure PCTCN2015076967-appb-100009
    Representing a first gradient corresponding to the jth element of the model parameter in the k+1th model parameter update,
    Figure PCTCN2015076967-appb-100010
    Representing a gradient corresponding to the kth element parameter update of the previous model parameter, the k being an integer greater than zero, and the j being greater than or equal to an integer of zero.
  7. 一种图像检索系统,其特征在于,包括:An image retrieval system, comprising:
    图像训练装置,检索装置和图像数据库;Image training device, retrieval device and image database;
    所述图像训练装置包括:计算单元,终止判定单元,梯度确定单元,速率更新单元以及参数更新单元;所述计算单元用于使用模型参数对目标函数进行迭代计算,所述目标函数为用于进行图像训练的代价函数;所述终止判定单元用于判定所述迭代计算的结果是否满足终止条件,若否,则所述执行梯度确定单元以及所述速率更新单元;若是,则获取满足所述终止条件的所述迭代计算的结果对应的模型参数;所述梯度确定单元用于确定所述目标函数在所述模型参数上的第一梯度;所述速率更新单元用于根据所述模型参数在所述目标函数中所表现出的参数分布特征更新学习速率;所述参数更新单元,用于根据所述学习速率和所述第一梯度更新所述模型参数,并触发所述计算单元及所述终止判定单元;所述检索装置用于根据所述图像训练装置确定的模型参数对输入的图像数据进行神经网络特征提取,并根据所述神经网络特征在所述图像数据库中进行图像检索,输出所述图像检索的结果。The image training device includes: a calculation unit, a termination determination unit, a gradient determination unit, a rate update unit, and a parameter update unit; the calculation unit is configured to iteratively calculate an objective function using a model parameter, wherein the objective function is for performing a cost function of image training; the termination determination unit is configured to determine whether the result of the iterative calculation satisfies a termination condition, and if not, the execution gradient determination unit and the rate update unit; if yes, the acquisition satisfies the termination a result of the iterative calculation of the condition corresponding to the model parameter; the gradient determining unit is configured to determine a first gradient of the objective function on the model parameter; the rate update unit is configured to The parameter distribution feature shown in the objective function updates the learning rate; the parameter updating unit is configured to update the model parameter according to the learning rate and the first gradient, and trigger the computing unit and the termination a determining unit; the searching device is configured to determine a model parameter according to the image training device The neural network feature extraction performed on the input image data, and image retrieval, the output of the image search results in the image database according to characteristic of the neural network.
  8. 根据所述权利要求7所述的方法,其特征在于,所述速率更新单元具体用于:The method according to claim 7, wherein the rate update unit is specifically configured to:
    根据所述目标函数在前一次模型参数上的梯度,及所述第一梯度更新所述学习速率。The learning rate is updated based on a gradient of the objective function over a previous model parameter and the first gradient.
  9. 根据所述权利要求8所述的方法,其特征在于,所述速率更新单元具体用于:The method according to claim 8, wherein the rate update unit is specifically configured to:
    对所述模型参数中每一个元素对应的学习速率进行更新,当对所述模型参数中的第j个元素进行处理时,根据如下公式对所述学习速率进行更新;Updating a learning rate corresponding to each element in the model parameter, and when processing the jth element in the model parameter, updating the learning rate according to the following formula;
    Figure PCTCN2015076967-appb-100011
    Figure PCTCN2015076967-appb-100011
    所述
    Figure PCTCN2015076967-appb-100012
    表示所述模型参数的第j个元素在第k+1次模型参数更新所对应的所述学习速率,所述
    Figure PCTCN2015076967-appb-100013
    表示所述模型参数的第j个元素在第k+1次模型 参数更新所对应的所述模型参数变化量,所述
    Figure PCTCN2015076967-appb-100014
    表示所述模型参数的第j个元素在第k+1次模型参数更新所对应的第一梯度,所述
    Figure PCTCN2015076967-appb-100015
    表示所述前一次模型参数的第j个元素在第k次模型参数更新所对应的梯度,所述k为大于零的整数,所述j大于或等于零的整数。
    Said
    Figure PCTCN2015076967-appb-100012
    Representing the learning rate corresponding to the jth element of the model parameter in the k+1th model parameter update,
    Figure PCTCN2015076967-appb-100013
    Representing the amount of change in the model parameter corresponding to the parameter update of the jth element of the model parameter in the k+1th model parameter,
    Figure PCTCN2015076967-appb-100014
    Representing a first gradient corresponding to the jth element of the model parameter in the k+1th model parameter update,
    Figure PCTCN2015076967-appb-100015
    Representing a gradient corresponding to the kth element parameter update of the previous model parameter, the k being an integer greater than zero, and the j being greater than or equal to an integer of zero.
PCT/CN2015/076967 2014-10-24 2015-04-20 Model parameter training method, device and system WO2016062044A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201410579249.2 2014-10-24
CN201410579249.2A CN104346629B (en) 2014-10-24 2014-10-24 A kind of model parameter training method, apparatus and system

Publications (1)

Publication Number Publication Date
WO2016062044A1 true WO2016062044A1 (en) 2016-04-28

Family

ID=52502192

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2015/076967 WO2016062044A1 (en) 2014-10-24 2015-04-20 Model parameter training method, device and system

Country Status (2)

Country Link
CN (1) CN104346629B (en)
WO (1) WO2016062044A1 (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108320026A (en) * 2017-05-16 2018-07-24 腾讯科技(深圳)有限公司 Machine learning model training method and device
CN110956018A (en) * 2019-11-22 2020-04-03 腾讯科技(深圳)有限公司 Training method of text processing model, text processing method, text processing device and storage medium
CN111260079A (en) * 2020-01-17 2020-06-09 南京星火技术有限公司 Electronic device, agent self-training apparatus, and computer-readable medium
CN111325354A (en) * 2020-03-13 2020-06-23 腾讯科技(深圳)有限公司 Machine learning model compression method and device, computer equipment and storage medium
CN111400915A (en) * 2020-03-17 2020-07-10 桂林理工大学 Sand liquefaction discrimination method and device based on deep learning
CN113763501A (en) * 2021-09-08 2021-12-07 上海壁仞智能科技有限公司 Iteration method of image reconstruction model and image reconstruction method

Families Citing this family (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104346629B (en) * 2014-10-24 2018-01-12 开源物联网(广州)有限公司 A kind of model parameter training method, apparatus and system
CN106408037B (en) * 2015-07-30 2020-02-18 阿里巴巴集团控股有限公司 Image recognition method and device
CN108074215B (en) * 2016-11-09 2020-04-14 京东方科技集团股份有限公司 Image frequency-raising system, training method thereof, and image frequency-raising method
CN110348571B (en) * 2016-11-29 2024-03-29 华为技术有限公司 Neural network model training method, device, chip and system
CN109389412B (en) * 2017-08-02 2022-03-04 创新先进技术有限公司 Method and device for training model, service equipment and user equipment
CN109800884B (en) * 2017-11-14 2023-05-26 阿里巴巴集团控股有限公司 Model parameter processing method, device, equipment and computer storage medium
CN108334947A (en) * 2018-01-17 2018-07-27 上海爱优威软件开发有限公司 A kind of the SGD training methods and system of intelligent optimization
CN108287763A (en) * 2018-01-29 2018-07-17 中兴飞流信息科技有限公司 Parameter exchange method, working node and parameter server system
CN110187647A (en) * 2018-02-23 2019-08-30 北京京东尚科信息技术有限公司 Model training method and system
CN111273953B (en) * 2018-11-19 2021-07-16 Oppo广东移动通信有限公司 Model processing method, device, terminal and storage medium
CN109784490B (en) 2019-02-02 2020-07-03 北京地平线机器人技术研发有限公司 Neural network training method and device and electronic equipment
CN111679912A (en) * 2020-06-08 2020-09-18 广州汇量信息科技有限公司 Load balancing method and device of server, storage medium and equipment

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030115325A1 (en) * 2001-12-18 2003-06-19 Ira Cohen Adapting Bayesian network parameters on-line in a dynamic environment
CN103020711A (en) * 2012-12-25 2013-04-03 中国科学院深圳先进技术研究院 Classifier training method and classifier training system
CN103971163A (en) * 2014-05-09 2014-08-06 哈尔滨工程大学 Adaptive learning rate wavelet neural network control method based on normalization lowest mean square adaptive filtering
CN104346629A (en) * 2014-10-24 2015-02-11 华为技术有限公司 Model parameter training method, device and system

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7286964B2 (en) * 2003-09-22 2007-10-23 Advanced Structure Monitoring, Inc. Methods for monitoring structural health conditions
CN100447808C (en) * 2007-01-12 2008-12-31 郑文明 Method for classification human facial expression and semantics judgement quantization method
CN101299234B (en) * 2008-06-06 2011-05-11 华南理工大学 Method for recognizing human eye state based on built-in type hidden Markov model
CN104008420A (en) * 2014-05-26 2014-08-27 中国科学院信息工程研究所 Distributed outlier detection method and system based on automatic coding machine

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030115325A1 (en) * 2001-12-18 2003-06-19 Ira Cohen Adapting Bayesian network parameters on-line in a dynamic environment
CN103020711A (en) * 2012-12-25 2013-04-03 中国科学院深圳先进技术研究院 Classifier training method and classifier training system
CN103971163A (en) * 2014-05-09 2014-08-06 哈尔滨工程大学 Adaptive learning rate wavelet neural network control method based on normalization lowest mean square adaptive filtering
CN104346629A (en) * 2014-10-24 2015-02-11 华为技术有限公司 Model parameter training method, device and system

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108320026A (en) * 2017-05-16 2018-07-24 腾讯科技(深圳)有限公司 Machine learning model training method and device
CN108320026B (en) * 2017-05-16 2022-02-11 腾讯科技(深圳)有限公司 Machine learning model training method and device
CN110956018A (en) * 2019-11-22 2020-04-03 腾讯科技(深圳)有限公司 Training method of text processing model, text processing method, text processing device and storage medium
CN110956018B (en) * 2019-11-22 2023-04-18 腾讯科技(深圳)有限公司 Training method of text processing model, text processing method, text processing device and storage medium
CN111260079A (en) * 2020-01-17 2020-06-09 南京星火技术有限公司 Electronic device, agent self-training apparatus, and computer-readable medium
CN111260079B (en) * 2020-01-17 2023-05-19 南京星火技术有限公司 Electronic equipment and intelligent body self-training device
CN111325354A (en) * 2020-03-13 2020-06-23 腾讯科技(深圳)有限公司 Machine learning model compression method and device, computer equipment and storage medium
CN111325354B (en) * 2020-03-13 2022-10-25 腾讯科技(深圳)有限公司 Machine learning model compression method and device, computer equipment and storage medium
CN111400915A (en) * 2020-03-17 2020-07-10 桂林理工大学 Sand liquefaction discrimination method and device based on deep learning
CN113763501A (en) * 2021-09-08 2021-12-07 上海壁仞智能科技有限公司 Iteration method of image reconstruction model and image reconstruction method
CN113763501B (en) * 2021-09-08 2024-02-27 上海壁仞智能科技有限公司 Iterative method of image reconstruction model and image reconstruction method

Also Published As

Publication number Publication date
CN104346629A (en) 2015-02-11
CN104346629B (en) 2018-01-12

Similar Documents

Publication Publication Date Title
WO2016062044A1 (en) Model parameter training method, device and system
US11829880B2 (en) Generating trained neural networks with increased robustness against adversarial attacks
WO2018227800A1 (en) Neural network training method and device
WO2023000574A1 (en) Model training method, apparatus and device, and readable storage medium
CN111797893B (en) Neural network training method, image classification system and related equipment
CN107229757B (en) Video retrieval method based on deep learning and Hash coding
Li et al. Learning balanced and unbalanced graphs via low-rank coding
WO2018095049A1 (en) Method and apparatus for generating recommended results
WO2016062095A1 (en) Video classification method and apparatus
CN113168559A (en) Automated generation of machine learning models
JP2018521382A (en) QUANTON representation for emulating quantum-like computations with classic processors
WO2021089013A1 (en) Spatial graph convolutional network training method, electronic device and storage medium
WO2022105108A1 (en) Network data classification method, apparatus, and device, and readable storage medium
WO2023065859A1 (en) Item recommendation method and apparatus, and storage medium
JP5881048B2 (en) Information processing system and information processing method
WO2021042857A1 (en) Processing method and processing apparatus for image segmentation model
WO2015188395A1 (en) Big data oriented metabolome feature data analysis method and system thereof
CN112529068B (en) Multi-view image classification method, system, computer equipment and storage medium
WO2022088390A1 (en) Image incremental clustering method and apparatus, electronic device, storage medium and program product
Ye et al. Efficient point cloud segmentation with geometry-aware sparse networks
Zhang et al. DATA: Differentiable architecture approximation with distribution guided sampling
Chen et al. Distribution knowledge embedding for graph pooling
CN113378938B (en) Edge transform graph neural network-based small sample image classification method and system
Meirom et al. Optimizing tensor network contraction using reinforcement learning
WO2021253938A1 (en) Neural network training method and apparatus, and video recognition method and apparatus

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 15853368

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 15853368

Country of ref document: EP

Kind code of ref document: A1