WO2016062044A1 - Procédé, dispositif et système d'apprentissage de paramètres de modèle - Google Patents
Procédé, dispositif et système d'apprentissage de paramètres de modèle Download PDFInfo
- Publication number
- WO2016062044A1 WO2016062044A1 PCT/CN2015/076967 CN2015076967W WO2016062044A1 WO 2016062044 A1 WO2016062044 A1 WO 2016062044A1 CN 2015076967 W CN2015076967 W CN 2015076967W WO 2016062044 A1 WO2016062044 A1 WO 2016062044A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- model parameter
- gradient
- parameter
- update
- objective function
- Prior art date
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
Definitions
- the present invention relates to communication technologies, and in particular, to a model parameter training method, apparatus and system.
- the traditional method of retrieving images based on keywords has a semantic gap problem, which often causes users to often fail to retrieve the images they want to search.
- the Content Based Image Retrieval (CBIR) method is a search method that is more similar to human thinking.
- the current CBIR system relies mainly on some shallow machine learning algorithms, and its performance is greatly limited.
- Deep Learning is the most eye-catching direction in machine learning in recent years.
- the motivation lies in the establishment and simulation of a neural network for the analysis of the human brain, which mimics the mechanisms of the human brain to interpret data such as images, sounds and text.
- the concept of deep learning stems from the research of artificial neural networks, and its basic learning structure is a multi-layer neural network.
- deep learning mimics the "deep” layer learning structure of the human brain through multiple transformations and expression steps. By exploring the deep structure, you can learn from the data to get a gradual abstraction of hierarchical features.
- Deep learning has aroused widespread concern in academia and industry, resulting in a series of Deep Neural Network (DNN) models, such as Deep Belief Nets (DBNs), Deep Boltzmann. Deep Boltzmann Machines (DBMs), Convolutional Neural Networks (CNNs), etc.
- DNN Deep Neural Network
- DBMs Deep Boltzmann Machines
- CNNs Convolutional Neural Networks
- the image retrieval problem to be solved is first abstracted into an optimization problem, the objective function is defined, and then solved by the corresponding optimization algorithm.
- the optimization problems to be solved as follows:
- w is the model parameter
- X is the training data set
- l(w;x) is the cost function.
- the goal of the solution is to find a set of optimal model parameters w * such that the model has the lowest total cost on the training data set.
- l(w;x) is usually related to the classification error rate, so minimizing the objective function L(w) is equivalent to minimizing the classification error rate.
- L(w) is usually a complex nonlinear function, and often the global optimal solution w * cannot be obtained, but only the local optimal solution can be obtained.
- the solution to the problem needs to be iterated on the training data.
- the commonly used methods are stochastic gradient descent method, Newton method and quasi-Newton method.
- Stochastic Gradient Descent is an optimization method widely used in deep learning.
- the advantage is that it is easy to implement, fast, and can be used for large-scale training sets.
- the basic process of the stochastic gradient descent method is: iteratively calculating the cost function using the initial model parameters, judging whether the result of the iterative calculation satisfies the termination condition, and if not, updating the model parameters according to the preset learning rate and the current gradient value, Continue the iterative calculation until the result of the iterative calculation satisfies the termination condition.
- a disadvantage of the prior art stochastic gradient descent method is that manual parameter selection is required, including learning rate, termination conditions, and the like.
- learning rate is set too small, the training process will be very slow; when the learning rate is set too large, it may skip the local optimal solution when updating the model parameters for iterative calculation, so that the convergence speed does not fall, even Causes no convergence.
- Embodiments of the present invention provide a model parameter training method, apparatus, and system for rapidly performing parameter retrieval of image retrieval or image classification.
- the objective function is iteratively calculated using model parameters, which are cost functions for image training,
- the updating the learning rate according to the parameter distribution feature that is displayed in the target function according to the model parameter includes:
- the learning rate is updated based on a gradient of the objective function over a previous model parameter and the first gradient.
- the step of updating the learning rate according to a gradient of the objective function on a previous model parameter, and the first gradient include:
- a calculation unit a termination determination unit, a gradient determination unit, a rate update unit, and a parameter update unit;
- the computing unit is configured to iteratively calculate an objective function using a model parameter, the objective function being a cost function for performing image training;
- the termination determining unit is configured to determine whether the result of the iterative calculation satisfies a termination condition, and if not, the execution gradient determining unit and the rate update unit; if yes, acquiring the iterative calculation that satisfies the termination condition Corresponding model parameters;
- the gradient determining unit is configured to determine a first gradient of the objective function on the model parameter
- the rate update unit is configured to update a learning rate according to a parameter distribution feature that is displayed in the objective function by the model parameter;
- the parameter updating unit is configured to update the model parameter according to the learning rate and the first gradient, and trigger the calculating unit and the termination determining unit.
- the rate update unit is specifically configured to:
- the learning rate is updated based on a gradient of the objective function over a previous model parameter and the first gradient.
- the rate update unit is specifically configured to:
- Image training device retrieval device and image database
- the image training device includes: a calculation unit, a termination determination unit, a gradient determination unit, a rate update unit, and a parameter update unit; the calculation unit is configured to iteratively calculate an objective function using a model parameter, wherein the objective function is for performing Cost function of image training;
- the termination determining unit is configured to determine whether the result of the iterative calculation satisfies a termination condition, and if not, the execution gradient determining unit and the rate update unit; if yes, acquiring the iterative calculation that satisfies the termination condition Corresponding model parameters; the gradient determination unit is used to determine a first gradient of the objective function on the model parameter; the rate update unit is configured to update a learning rate according to a parameter distribution feature exhibited by the model parameter in the objective function; the parameter update unit, And configured to update the model parameter according to the learning rate and the first gradient, and trigger the computing unit and the termination determining unit.
- the searching device is configured to perform neural network feature extraction on the input image data according to the model parameter determined by the image training device, and perform image retrieval in the image database according to the neural network feature, and output the image retrieval result.
- the rate update unit is specifically configured to:
- the learning rate is updated based on a gradient of the objective function over a previous model parameter and the first gradient.
- the rate update unit is specifically configured to:
- the iterative calculation is continued, and the parameter distribution represented by the target function according to the model parameter is performed before the next iteration calculation.
- the feature updates the learning rate, and then uses the learning rate to update the model parameters used in the next iteration calculation, so that the variation range of the model parameters can be adaptively adjusted according to the parameter distribution characteristics of the objective function, thereby local optimization away from the model parameters.
- you can pass The learning rate sets the magnitude of the variation of the model parameters to speed up the iterative calculation process.
- the variation range of the model parameters can be set by updating the learning rate, and the iterative calculation is improved. Efficiency, and in turn, improves the speed of image training.
- FIG. 1 is a schematic structural diagram of an image retrieval apparatus in an embodiment of the present invention.
- FIG. 2 is a schematic flow chart of a model parameter training method in an embodiment of the present invention.
- FIG. 3 is another schematic flowchart of a model parameter training method in an embodiment of the present invention.
- FIG. 4 is a schematic diagram of an objective function curve in an embodiment of the present invention.
- Figure 5 is another schematic diagram of an objective function curve in an embodiment of the present invention.
- FIG. 6 is another schematic diagram of an objective function curve in an embodiment of the present invention.
- FIG. 7 is another schematic diagram of an objective function curve in an embodiment of the present invention.
- FIG. 8 is a schematic diagram of a convergence test in an embodiment of the present invention.
- FIG. 9 is a schematic structural diagram of an image training apparatus according to an embodiment of the present invention.
- FIG. 10 is a schematic diagram showing the computer structure of an image training apparatus based on a model parameter training method according to an embodiment of the present invention.
- a model parameter training method in the embodiment of the present invention is applied to the image retrieval system shown in FIG. 1 , specifically:
- a computer device In practical applications, in order to enable a computer to output human-desired results when searching, a computer device is required for deep learning to establish and simulate a neural network for human brain analysis and learning, which mimics the mechanism of the human brain to interpret data and images.
- the training device simulates the deep learning structure of the human brain through multiple transformation and expression steps. By exploring the deep structure, it is possible to learn from the data to obtain gradually abstract hierarchical features. Therefore, in order to realize deep learning, it is necessary to provide the image training device 11 in the image retrieval system, perform training of mass data, and determine model parameters for image retrieval.
- the image data is input in the retrieval device 12 of the image retrieval system, and the retrieval device 12 performs neural network feature extraction on the image data according to the model parameters determined by the image training device 11, and according to the neural network.
- the feature performs a comparative search of the image in the image database 13, and outputs the result of the image retrieval.
- the result of the image retrieval may be output in descending order according to the similarity of the image.
- the image retrieval problem to be solved is first abstracted into an optimization problem, the objective function is defined, and then solved by a corresponding optimization algorithm, and the solution target is to find a group of the most Excellent model parameters make the model the least costly on the training data set.
- the stochastic gradient descent method can be used to solve the optimal model parameters.
- the speed of image training is not ideal.
- the model parameter training method is optimized and improved according to the stochastic gradient descent method.
- the model parameters used in the current iterative calculation are the model parameters updated after the previous iteration calculation.
- the model parameter used in the current iterative calculation is used as the first model parameter
- the model parameter used in the previous iterative calculation is used as the second model parameter
- a gradient on a model parameter is used as a first gradient
- a gradient of the objective function on the second model parameter is used as a second gradient.
- the initial model parameters are used as the first model parameters.
- the initial model parameters are updated for the first time using the initial learning rate, and the updated model parameters are taken as First iteratively calculated first model parameter; embodiment of the invention
- the model parameter training method is applied to the iterative calculation after the "first update of the initial model parameters".
- an embodiment of a method for training a model parameter in an embodiment of the present invention includes:
- the image training device iteratively calculates an objective function using a first model parameter, which is a cost function for performing image training.
- a first model parameter which is a cost function for performing image training.
- the mapping of the input image through the neural network is ⁇ w (q), ⁇ w (q + ), ⁇ w (q - ) ⁇ , where ⁇ w (q), ⁇ w (q + ), ⁇ w (q - ) All are one-dimensional column vectors, used as image feature representations, then the cost function can be:
- cost function may also have other representations, which need to be determined according to actual needs, which is not limited herein.
- the image training device After the image training device iteratively calculates the objective function using the first model parameter, the image training device determines whether the result of the current iterative calculation satisfies the termination condition, and if not, determines the first of the objective function on the model parameter Gradient, and updating the learning rate according to the parameter distribution characteristics exhibited by the model parameter in the objective function; the learning rate is used to determine an update range of the first model parameter.
- the “parameter distribution feature exhibited by the model parameter in the objective function” may be expressed as a gradient change of a corresponding parameter point on a function image of the objective function.
- the termination condition may have multiple manifestations, for example, when the first model parameter meets a certain range of values in the settlement result of the objective function, the iterative calculation is terminated; For example, when the number of iteration calculations reaches a certain threshold, the iterative calculation is terminated. It can be understood that the termination condition may have more representations in practical applications, and is not specifically limited herein.
- the image training device updates the first model parameter according to the learning rate and the first gradient, and the specific learning rate may be used to determine an update amplitude of the first model parameter, where the first gradient may be used Determining an update direction of the first model parameter.
- step 201 and step 202 are triggered in sequence until the result of the iterative calculation satisfies the termination condition, then the iterative calculation is stopped, and the first model parameter that satisfies the termination condition is acquired.
- the iterative calculation is continued, and the parameter represented by the model parameter in the objective function is performed before the next iterative calculation.
- the distribution feature updates the learning rate, and then uses the learning rate to update the model parameters used in the next iteration calculation, so that the variation range of the model parameters can be adaptively adjusted according to the parameter distribution characteristics of the objective function, thereby being far away from the model parameters.
- the variation range of the model parameters can be set by the learning rate to speed up the iterative calculation process.
- the variation range of the model parameters can be set by updating the learning rate. , improve the efficiency of iterative calculations, and thus improve the speed of image training.
- the iterative calculation of training data can also use Newton method and quasi-Newton method, but the second-order partial derivative and Hessian matrix need to be calculated in the calculation process, the computational complexity is high, and sometimes the Hessian matrix of the objective function cannot maintain positive definite Thus, the Newton method or the quasi-Newton method is invalidated.
- the model parameter determination method proposed by the embodiment of the present invention does not require the information of the second derivative and the calculation or approximate calculation of the Hessian matrix, so it is more efficient than the Newton method and the quasi-Newton method, and can be used to solve other unconstrained, constrained or large-scale Nonlinear optimization problem.
- the subscript k indicates that the parameter corresponding to the iterative calculation is currently being performed
- the superscript j indicates that the parameter is the same as the first model parameter.
- another embodiment of the method for determining a model parameter in the embodiment of the present invention includes:
- the image training device iteratively calculates an objective function using a first model parameter, which is a cost function for performing image training.
- the image training device After the image training device iteratively calculates the objective function using the first model parameter, the image training device determines whether the result of the current iterative calculation satisfies the termination condition, and if so, stops the iterative calculation, and acquires the first model that satisfies the termination condition. Parameter; if no, step 303 is performed.
- the termination condition may have multiple manifestations, for example, when the first model parameter satisfies a certain range of values in the settlement result of the objective function, the iterative calculation is terminated; The iterative calculation is terminated when the number of iterative calculations reaches a certain threshold. It can be understood that the termination condition may have more representations in practical applications, and is not specifically limited herein.
- An image training device determines a first gradient of the objective function on the model parameter and updates a learning rate according to a parameter distribution characteristic of the model parameter in the objective function; the learning rate is used to determine The update range of the first model parameter is described.
- the updating the learning rate according to the parameter distribution feature that is displayed in the objective function according to the model parameter includes:
- the learning rate is updated based on a gradient of the objective function over a previous model parameter and the first gradient.
- the gradient value g k at the first model parameter w k in calculating the objective function L(w) may specifically be:
- the learning rate is updated according to the second gradient, the model parameter variation, and the first gradient, specifically:
- the learning rate corresponding to the jth element of the model parameter is updated in the k+1th model parameter, and the
- the model parameter change amount is a difference between an element in the first model parameter and an element in a corresponding order or position in the second model parameter, and then takes an absolute value.
- the method for updating the first model parameters in the stochastic gradient descent method is:
- the learning rate ⁇ k is proportional to the absolute value
- ⁇ k is a proportional parameter between the learning rate and the amount of change of the model parameter.
- Equation 8 the relationship of the rate ⁇ k can be learned as:
- the image training device updates the first model parameter according to the learning rate and the first gradient, and the specific learning rate may be used to determine an update amplitude of the first model parameter, where the first gradient may be used Determining an update direction of the first model parameter.
- step 301 and step 302 are triggered in sequence until the result of the iterative calculation satisfies the termination condition, then the iterative calculation is stopped, and the first model parameter that satisfies the termination condition is acquired.
- point A is the parameter point corresponding to the k-1th iteration
- point B is the parameter point corresponding to the kth iteration calculation
- C The point is a parameter point corresponding to a local optimum value in the objective function. According to the formula nine can be obtained: This allows the next iteration (k+1th iteration) to fall between point A and point B near point A, adaptively approaching the local optimal parameter point C.
- point A is the parameter point corresponding to the k-1th iteration calculation
- point B is the parameter point corresponding to the kth iteration calculation
- point C is the parameter corresponding to a local optimum value in the objective function. point.
- an image retrieval experiment was performed on the Paris data set.
- the data set has 6,412 images and contains 11 landmarks (landmarks) in Paris. Each of the landmarks selected 5 images for use as a query.
- the CNNs feature is first extracted on the ImageNet dataset, and then learned and adjusted (model tuning) using the SGD and the method of the present invention on the Paris dataset. Since the model contains about 60 million parameters, neither Newton nor quasi-Newton methods can be used for model training. Therefore, only the method of the present invention and the currently widely used SGD method were compared in the experiment. Compare the convergence speed of SGD and the proposed method in model tuning, and learn after tuning. The average accuracy (mAP) of the model in the image retrieval task.
- mAP average accuracy
- FIG. 8 is a comparison of the convergence speed of the training of the model parameter training method in the SGD algorithm and the embodiment of the present invention. Since the training uses a randomly selected 3-tuple, the loss function fluctuates greatly, taking the average of the last hundred iterations to smooth the convergence curve. It can be seen that the convergence speed of the model parameter training method in the embodiment of the present invention is significantly faster than the SGD algorithm, and the iterative error of the model parameter training method in the embodiment of the present invention is much lower than that of the SGD, and the iteration is 10000 times. The error has reached the final (100,000 times) convergence error of SGD (0.0125), that is, under the same error termination condition, the model parameter training method in the embodiment of the present invention is increased by 10 times.
- an embodiment of an image training apparatus in an embodiment of the present invention includes:
- Computing unit 901 termination determining unit 902, gradient determining unit 903, rate updating unit 904 and parameter updating unit 905;
- the calculating unit 901 is configured to perform an iterative calculation on the objective function using a model parameter, where the objective function is a cost function for performing image training;
- the termination determining unit 902 is configured to determine whether the result of the iterative calculation satisfies a termination condition, and if not, the execution gradient determining unit 903 and the rate updating unit 904; if yes, acquiring a location that satisfies the termination condition The model parameters corresponding to the results of the iterative calculation;
- the gradient determining unit 903 is configured to determine a first gradient of the objective function on the model parameter
- the rate update unit 904 is configured to update a learning rate according to a parameter distribution feature that is displayed in the target function by the model parameter;
- the parameter updating unit 905 is configured to update the according to the learning rate and the first gradient
- the model parameters are triggered, and the calculation unit 901 and the termination determination unit 902 are triggered.
- rate update unit 904 is specifically configured to:
- the learning rate is updated based on a gradient of the objective function over a previous model parameter and the first gradient.
- the calculation unit 901 performs an iterative calculation on the objective function using the first model parameter, which is a cost function for performing image training.
- the termination determination unit 902 determines whether the result of the current iteration calculation satisfies the termination condition, and if not, executes the gradient determination unit 903 and the rate update unit 904.
- the termination condition may have multiple manifestations, for example, when the first model parameter satisfies a certain range of values in the settlement result of the objective function, the iterative calculation is terminated; The iterative calculation is terminated when the number of iterative calculations reaches a certain threshold. It can be understood that the termination condition may have more representations in practical applications, and is not specifically limited herein.
- the gradient determining unit 903 determines a first gradient based on the objective function, the first gradient being a gradient of the objective function at the first model parameter.
- the gradient value g k at the first model parameter w k in calculating the objective function L(w) may specifically be:
- the rate update unit 904 updates the learning rate according to the parameter distribution features exhibited by the model parameters in the objective function, the learning rate being used to determine the update magnitude of the first model parameter.
- the updating the learning rate according to the parameter distribution feature that is displayed in the objective function according to the model parameter includes:
- the learning rate is updated based on a gradient of the objective function over a previous model parameter and the first gradient.
- the learning rate is updated according to the second gradient, the model parameter variation, and the first gradient, specifically:
- the parameter updating unit 905 updates the first model parameter according to the learning rate and the first gradient, and the specific learning rate may be used to determine an update range of the first model parameter, where the first gradient may be used. And determining an update direction of the first model parameter.
- the calculation unit 901 is triggered again, and the iterative calculation is continued on the target function using the updated first model parameter until the result of the iterative calculation satisfies the termination condition, and then stops.
- the iterative calculation acquires a first model parameter that satisfies the termination condition.
- FIG. 10 is a schematic structural diagram of an image training device 20 according to an embodiment of the present invention.
- the image training device 20 can include an input device 210, an output device 220, a processor 230, and a memory 240.
- the image training device 20 provided by the embodiment of the present invention is applied to a stream computing system, where the stream computing system is configured to process and process a service, where the stream computing system includes a master node and a plurality of working nodes; Scheduling each sub-service included in the service to the plurality of work nodes for processing.
- Memory 240 can include read only memory and random access memory and provides instructions and data to processor 230. A portion of memory 240 may also include non-volatile random access memory (NVRAM).
- NVRAM non-volatile random access memory
- the memory 240 stores the following elements, executable modules or data structures, or a subset thereof, or an extended set thereof:
- Operation instructions include various operation instructions for implementing various operations.
- Operating system Includes a variety of system programs for implementing various basic services and handling hardware-based tasks.
- the processor 230 performs the following operations by calling an operation instruction stored in the memory 240 (the operation instruction can be stored in the operating system):
- the processor 330 is specifically configured to perform iterative calculation on the objective function using the first model parameter, where the objective function is a cost function for performing image training; if the result of the iterative calculation does not satisfy the termination condition, determining the a first gradient of the objective function on the model parameter, and updating a learning rate according to the parameter distribution characteristic exhibited by the model parameter in the objective function; updating the learning according to the learning rate and the first gradient The first model parameter; repeating the above steps until the result of the iterative calculation satisfies the termination condition, and acquiring a first model parameter that satisfies the termination condition.
- the updating the learning rate according to the parameter distribution feature that is displayed in the objective function according to the model parameter includes:
- the learning rate is updated based on a gradient of the objective function over a previous model parameter and the first gradient.
- the processor 230 controls the operation of the image training device 20, which may also be referred to as a CPU (Central Processing Unit).
- Memory 240 can include read only memory and random access memory and provides instructions and data to processor 230. A portion of memory 240 may also include non-volatile random access memory (NVRAM).
- NVRAM non-volatile random access memory
- the various components of image training device 20 are coupled together by a bus system 250, wherein bus system 250 includes, in addition to a data bus. In addition, it can also include a power bus, a control bus, and a status signal bus. However, for clarity of description, various buses are labeled as bus system 250 in the figure.
- Processor 230 may be an integrated circuit chip with signal processing capabilities. In the implementation process, each step of the foregoing method may be completed by an integrated logic circuit of hardware in the processor 230 or an instruction in a form of software.
- the processor 230 described above may be a general purpose processor, a digital signal processor (DSP), an application specific integrated circuit (ASIC), an off-the-shelf programmable gate array (FPGA) or other programmable logic device, a discrete gate or transistor logic device, or discrete hardware. Component.
- DSP digital signal processor
- ASIC application specific integrated circuit
- FPGA off-the-shelf programmable gate array
- the methods, steps, and logical block diagrams disclosed in the embodiments of the present invention may be implemented or carried out.
- the general purpose processor may be a microprocessor or the processor or any conventional processor or the like.
- the steps of the method disclosed in the embodiments of the present invention may be directly implemented by the hardware decoding processor, or may be performed by a combination of hardware and software modules in the decoding processor.
- the software module can be located in a conventional storage medium such as random access memory, flash memory, read only memory, programmable read only memory or electrically erasable programmable memory, registers, and the like.
- the storage medium is located in the memory 240, and the processor 230 reads the information in the memory 240 and performs the steps of the above method in combination with its hardware.
- an embodiment of an image retrieval system in an embodiment of the present invention includes:
- Image training device 11 retrieval device 12 and image database 13;
- the image training device 11 includes: a calculation unit, a termination determination unit, a gradient determination unit, a rate update unit, and a parameter update unit; the calculation unit is configured to iteratively calculate an objective function using a model parameter, the objective function being used for Performing a cost function of image training; the termination determination unit is configured to determine whether the result of the iterative calculation satisfies a termination condition, and if not, the execution gradient determination unit and the rate update unit; if yes, the acquisition satisfies the a model parameter corresponding to the result of the iterative calculation of the termination condition; the gradient determination unit is configured to determine a first gradient of the objective function on the model parameter; the rate update unit is configured to be based on the model parameter The parameter distribution feature displayed in the objective function updates a learning rate; the parameter updating unit is configured to update the model parameter according to the learning rate and the first gradient, and trigger the computing unit and the Terminate the decision unit.
- the searching device is configured to perform neural network feature extraction on the input image data according to the model
- the searching device 12 is configured to perform neural network feature extraction on the input image data according to the model parameters determined by the image training device, and perform image retrieval in the image database 13 according to the neural network feature, and output the image. The result of the search.
- rate update unit is specifically configured to:
- the learning rate is updated based on a gradient of the objective function over a previous model parameter and the first gradient.
- the disclosed apparatus and method can be implemented in other ways.
- the device embodiments described above are merely illustrative.
- the division of the unit is only a logical function division.
- there may be another division manner for example, multiple units or components may be combined or Can be integrated into another system, or some features can be ignored or not executed.
- Another point, the mutual coupling or direct coupling or communication shown or discussed The letter connection can be an indirect coupling or communication connection through some interface, device or unit, and can be in electrical, mechanical or other form.
- the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, may be located in one place, or may be distributed to multiple network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of the embodiment.
- each functional unit in each embodiment of the present invention may be integrated into one processing unit, or each unit may exist physically separately, or two or more units may be integrated into one unit.
- the above integrated unit can be implemented in the form of hardware or in the form of a software functional unit.
- the integrated unit if implemented in the form of a software functional unit and sold or used as a standalone product, may be stored in a computer readable storage medium.
- the technical solution of the present invention which is essential or contributes to the prior art, or all or part of the technical solution, may be embodied in the form of a software product stored in a storage medium.
- a number of instructions are included to cause a computer device (which may be a personal computer, server, or network device, etc.) to perform all or part of the steps of the methods described in various embodiments of the present invention.
- the foregoing storage medium includes: a U disk, a mobile hard disk, a read-only memory (ROM), a random access memory (RAM), a magnetic disk, or an optical disk, and the like. .
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Biology (AREA)
- Evolutionary Computation (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Bioinformatics & Computational Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Databases & Information Systems (AREA)
- Image Analysis (AREA)
Abstract
Procédé, dispositif et système d'apprentissage de paramètres de modèle, utilisés pour réaliser rapidement un apprentissage de paramètres en vue d'une extraction d'images ou d'une classification d'images. Le procédé comporte les étapes consistant à: utiliser des paramètres de modèle pour réaliser un calcul itératif sur une fonction objectif, la fonction objectif étant une fonction de coût utilisée pour l'apprentissage d'images; si le résultat du calcul itératif ne satisfait pas la condition d'arrêt, déterminer le premier gradient de la fonction objectif sur les paramètres de modèle, et actualiser le taux d'apprentissage en fonction des caractéristiques de distribution de paramètres des paramètres de modèle dans la fonction objectif; actualiser les paramètres de modèle en fonction du taux d'apprentissage et du premier gradient; répéter les étapes précédentes jusqu'à ce que le résultat du calcul itératif satisfasse la condition d'arrêt; et obtenir le paramètre de modèle correspondant au résultat du calcul itératif qui satisfait la condition d'arrêt.
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201410579249.2A CN104346629B (zh) | 2014-10-24 | 2014-10-24 | 一种模型参数训练方法、装置及系统 |
CN201410579249.2 | 2014-10-24 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2016062044A1 true WO2016062044A1 (fr) | 2016-04-28 |
Family
ID=52502192
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2015/076967 WO2016062044A1 (fr) | 2014-10-24 | 2015-04-20 | Procédé, dispositif et système d'apprentissage de paramètres de modèle |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN104346629B (fr) |
WO (1) | WO2016062044A1 (fr) |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108320026A (zh) * | 2017-05-16 | 2018-07-24 | 腾讯科技(深圳)有限公司 | 机器学习模型训练方法和装置 |
CN110956018A (zh) * | 2019-11-22 | 2020-04-03 | 腾讯科技(深圳)有限公司 | 文本处理模型的训练方法、文本处理方法、装置及存储介质 |
CN111260079A (zh) * | 2020-01-17 | 2020-06-09 | 南京星火技术有限公司 | 电子设备、智能体自训练装置和计算机可读介质 |
CN111325354A (zh) * | 2020-03-13 | 2020-06-23 | 腾讯科技(深圳)有限公司 | 机器学习模型压缩方法、装置、计算机设备和存储介质 |
CN111400915A (zh) * | 2020-03-17 | 2020-07-10 | 桂林理工大学 | 一种基于深度学习的砂土液化判别方法及装置 |
CN113763501A (zh) * | 2021-09-08 | 2021-12-07 | 上海壁仞智能科技有限公司 | 图像重建模型的迭代方法和图像重建方法 |
US12020162B2 (en) | 2020-11-30 | 2024-06-25 | International Business Machines Corporation | Weight-based local modulation of weight update in neural networks |
Families Citing this family (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104346629B (zh) * | 2014-10-24 | 2018-01-12 | 开源物联网(广州)有限公司 | 一种模型参数训练方法、装置及系统 |
CN106408037B (zh) * | 2015-07-30 | 2020-02-18 | 阿里巴巴集团控股有限公司 | 图像识别方法及装置 |
CN108074215B (zh) * | 2016-11-09 | 2020-04-14 | 京东方科技集团股份有限公司 | 图像升频系统及其训练方法、以及图像升频方法 |
CN110348571B (zh) * | 2016-11-29 | 2024-03-29 | 华为技术有限公司 | 一种神经网络模型训练方法、装置、芯片和系统 |
CN109389412B (zh) * | 2017-08-02 | 2022-03-04 | 创新先进技术有限公司 | 一种训练模型的方法、装置、服务设备以及用户设备 |
CN109800884B (zh) * | 2017-11-14 | 2023-05-26 | 阿里巴巴集团控股有限公司 | 模型参数的处理方法、装置、设备和计算机存储介质 |
CN108334947A (zh) * | 2018-01-17 | 2018-07-27 | 上海爱优威软件开发有限公司 | 一种智能优化的sgd训练方法及系统 |
CN108287763A (zh) * | 2018-01-29 | 2018-07-17 | 中兴飞流信息科技有限公司 | 参数交换方法、工作节点以及参数服务器系统 |
CN110187647B (zh) * | 2018-02-23 | 2024-08-16 | 北京京东尚科信息技术有限公司 | 模型训练方法及系统 |
CN111273953B (zh) * | 2018-11-19 | 2021-07-16 | Oppo广东移动通信有限公司 | 模型处理方法、装置、终端及存储介质 |
CN109784490B (zh) | 2019-02-02 | 2020-07-03 | 北京地平线机器人技术研发有限公司 | 神经网络的训练方法、装置和电子设备 |
CN111679912A (zh) * | 2020-06-08 | 2020-09-18 | 广州汇量信息科技有限公司 | 一种服务器的负载均衡方法、装置、存储介质及设备 |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20030115325A1 (en) * | 2001-12-18 | 2003-06-19 | Ira Cohen | Adapting Bayesian network parameters on-line in a dynamic environment |
CN103020711A (zh) * | 2012-12-25 | 2013-04-03 | 中国科学院深圳先进技术研究院 | 分类器训练方法及其系统 |
CN103971163A (zh) * | 2014-05-09 | 2014-08-06 | 哈尔滨工程大学 | 一种基于归一化最小均方自适应滤波的自适应学习率小波神经网络控制方法 |
CN104346629A (zh) * | 2014-10-24 | 2015-02-11 | 华为技术有限公司 | 一种模型参数训练方法、装置及系统 |
Family Cites Families (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
AU2004277167A1 (en) * | 2003-09-22 | 2005-04-07 | Kim Hyeung-Yun | Methods for monitoring structural health conditions |
CN100447808C (zh) * | 2007-01-12 | 2008-12-31 | 郑文明 | 人脸表情图像的分类及语义评判量化方法 |
CN101299234B (zh) * | 2008-06-06 | 2011-05-11 | 华南理工大学 | 一种基于嵌入式隐马尔可夫模型的人眼状态识别方法 |
CN104008420A (zh) * | 2014-05-26 | 2014-08-27 | 中国科学院信息工程研究所 | 一种基于自动编码机的分布式离群点检测方法及系统 |
-
2014
- 2014-10-24 CN CN201410579249.2A patent/CN104346629B/zh not_active Expired - Fee Related
-
2015
- 2015-04-20 WO PCT/CN2015/076967 patent/WO2016062044A1/fr active Application Filing
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20030115325A1 (en) * | 2001-12-18 | 2003-06-19 | Ira Cohen | Adapting Bayesian network parameters on-line in a dynamic environment |
CN103020711A (zh) * | 2012-12-25 | 2013-04-03 | 中国科学院深圳先进技术研究院 | 分类器训练方法及其系统 |
CN103971163A (zh) * | 2014-05-09 | 2014-08-06 | 哈尔滨工程大学 | 一种基于归一化最小均方自适应滤波的自适应学习率小波神经网络控制方法 |
CN104346629A (zh) * | 2014-10-24 | 2015-02-11 | 华为技术有限公司 | 一种模型参数训练方法、装置及系统 |
Cited By (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108320026A (zh) * | 2017-05-16 | 2018-07-24 | 腾讯科技(深圳)有限公司 | 机器学习模型训练方法和装置 |
CN108320026B (zh) * | 2017-05-16 | 2022-02-11 | 腾讯科技(深圳)有限公司 | 机器学习模型训练方法和装置 |
CN110956018A (zh) * | 2019-11-22 | 2020-04-03 | 腾讯科技(深圳)有限公司 | 文本处理模型的训练方法、文本处理方法、装置及存储介质 |
CN110956018B (zh) * | 2019-11-22 | 2023-04-18 | 腾讯科技(深圳)有限公司 | 文本处理模型的训练方法、文本处理方法、装置及存储介质 |
CN111260079A (zh) * | 2020-01-17 | 2020-06-09 | 南京星火技术有限公司 | 电子设备、智能体自训练装置和计算机可读介质 |
CN111260079B (zh) * | 2020-01-17 | 2023-05-19 | 南京星火技术有限公司 | 电子设备和智能体自训练装置 |
CN111325354A (zh) * | 2020-03-13 | 2020-06-23 | 腾讯科技(深圳)有限公司 | 机器学习模型压缩方法、装置、计算机设备和存储介质 |
CN111325354B (zh) * | 2020-03-13 | 2022-10-25 | 腾讯科技(深圳)有限公司 | 机器学习模型压缩方法、装置、计算机设备和存储介质 |
CN111400915A (zh) * | 2020-03-17 | 2020-07-10 | 桂林理工大学 | 一种基于深度学习的砂土液化判别方法及装置 |
US12020162B2 (en) | 2020-11-30 | 2024-06-25 | International Business Machines Corporation | Weight-based local modulation of weight update in neural networks |
CN113763501A (zh) * | 2021-09-08 | 2021-12-07 | 上海壁仞智能科技有限公司 | 图像重建模型的迭代方法和图像重建方法 |
CN113763501B (zh) * | 2021-09-08 | 2024-02-27 | 上海壁仞智能科技有限公司 | 图像重建模型的迭代方法和图像重建方法 |
Also Published As
Publication number | Publication date |
---|---|
CN104346629A (zh) | 2015-02-11 |
CN104346629B (zh) | 2018-01-12 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2016062044A1 (fr) | Procédé, dispositif et système d'apprentissage de paramètres de modèle | |
US11829880B2 (en) | Generating trained neural networks with increased robustness against adversarial attacks | |
WO2023000574A1 (fr) | Procédé, appareil et dispositif d'entrainement de modèle, et support de stockage lisible | |
WO2018227800A1 (fr) | Procédé et dispositif d'apprentissage de réseau neuronal | |
Li et al. | Learning balanced and unbalanced graphs via low-rank coding | |
WO2018095049A1 (fr) | Procédé et appareil permettant de générer des résultats recommandés | |
WO2020062770A1 (fr) | Procédé et appareil de construction de dictionnaire de domaine et dispositif et support d'enregistrement | |
WO2016062095A1 (fr) | Procédé et appareil de classement de vidéos | |
WO2023065859A1 (fr) | Procédé et appareil de recommandation d'article, et support de stockage | |
CN113168559A (zh) | 机器学习模型的自动化生成 | |
JP2018521382A (ja) | 古典的なプロセッサで量子類似計算をエミュレートするためのquanton表現 | |
WO2021089013A1 (fr) | Procédé de formation de réseau de convolution de graphe spatial, dispositif électronique et support de stockage | |
JP5881048B2 (ja) | 情報処理システム、及び、情報処理方法 | |
WO2022105108A1 (fr) | Procédé, appareil et dispositif de classification de données de réseau, et support de stockage lisible | |
WO2021042857A1 (fr) | Procédé de traitement et appareil de traitement pour modèle de segmentation d'image | |
CN113378938B (zh) | 一种基于边Transformer图神经网络的小样本图像分类方法及系统 | |
CN113011568B (zh) | 一种模型的训练方法、数据处理方法及设备 | |
WO2022088390A1 (fr) | Procédé et appareil de regroupement incrémentiel d'images, dispositif électronique, support de stockage et produit-programme | |
WO2021253938A1 (fr) | Procédé et appareil d'apprentissage de réseau neuronal, et procédé et appareil de reconnaissance vidéo | |
WO2015188395A1 (fr) | Procédé d'analyse de données de caractéristiques d'un métabolome orienté mégadonnées et système associé | |
CN112529068B (zh) | 一种多视图图像分类方法、系统、计算机设备和存储介质 | |
Ye et al. | Efficient point cloud segmentation with geometry-aware sparse networks | |
Chen et al. | Distribution knowledge embedding for graph pooling | |
Zhang et al. | DATA: Differentiable architecture approximation with distribution guided sampling | |
Meirom et al. | Optimizing tensor network contraction using reinforcement learning |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 15853368 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 15853368 Country of ref document: EP Kind code of ref document: A1 |